Advancing Face Recognition with Zero-Shot Learning: A CLIP-Based Approach

Mohammed Al-Mukhtar
Ashraful Alam Nirob
MD Samiul Islam

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Face recognition technology has advanced significantly, yet many existing systems struggle to recognize new faces without large labeled datasets. This study introduces a custom CLIP-based framework that integrates Zero-Shot Learning (ZSL) for face recognition without extensive retraining. By leveraging visual-text alignment, the model maps facial features to descriptive text representations in a shared embedding space, ensuring adaptability to unseen identities. A projection layer is incorporated post-feature extraction to enhance alignment, improving recognition accuracy. Additionally, contrastive learning optimizes image-text relationships, allowing effective generalization to unseen classes while preserving zero-shot capability. Extensive experiments validated the framework’s effectiveness, achieving a validation accuracy of 0.9621, outperforming state-of-the-art models. In generalization tests on unseen classes, it attained 0.78 accuracy with high ROC-AUC scores, surpassing traditional face recognition models. The framework was also tested under varying image conditions, such as different resolutions and viewing angles, and maintained stable accuracy, proving its robustness in real-world scenarios. These results highlight its scalability and efficiency, making it well-suited for applications requiring adaptability to unseen identities. The proposed framework provides a practical, high-performance solution for modern face recognition, addressing key limitations in existing systems while ensuring adaptability to diverse and dynamic environments.

Version published to 10.21203/rs.3.rs-6397028/v1 on Research Square
Oct 29, 2025

Modelling and dense network model for moderate facial alignment prediction by feature representation

This article has 2 authors:
1. K.Gayathri
2. S.BaghyaShree
This article has no evaluationsLatest version Jan 7, 2026
FCL: Frequency-based Contrastive Learning for Generalizable Face Forgery Detection

This article has 5 authors:
1. Yu Zhu
2. Shengze Wang
3. Yufeng Gu
4. Ziming Zhu
5. Nan Wang
This article has no evaluationsLatest version Dec 25, 2025
Multimodal Model Based on Contrastive Language-Image Pretraining for Micro-Expression Recognition

This article has 5 authors:
1. Peng Yang
2. Xiaoguang Wu
3. Yanyang Zhou
4. Qilin Wei
5. Zhifeng Zeng
This article has no evaluationsLatest version Dec 17, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Modelling and dense network model for moderate facial alignment prediction by feature representation

FCL: Frequency-based Contrastive Learning for Generalizable Face Forgery Detection

Multimodal Model Based on Contrastive Language-Image Pretraining for Micro-Expression Recognition