Advancing Face Recognition with Zero-Shot Learning: A CLIP-Based Approach
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Face recognition technology has advanced significantly, yet many existing systems struggle to recognize new faces without large labeled datasets. This study introduces a custom CLIP-based framework that integrates Zero-Shot Learning (ZSL) for face recognition without extensive retraining. By leveraging visual-text alignment, the model maps facial features to descriptive text representations in a shared embedding space, ensuring adaptability to unseen identities. A projection layer is incorporated post-feature extraction to enhance alignment, improving recognition accuracy. Additionally, contrastive learning optimizes image-text relationships, allowing effective generalization to unseen classes while preserving zero-shot capability. Extensive experiments validated the framework’s effectiveness, achieving a validation accuracy of 0.9621, outperforming state-of-the-art models. In generalization tests on unseen classes, it attained 0.78 accuracy with high ROC-AUC scores, surpassing traditional face recognition models. The framework was also tested under varying image conditions, such as different resolutions and viewing angles, and maintained stable accuracy, proving its robustness in real-world scenarios. These results highlight its scalability and efficiency, making it well-suited for applications requiring adaptability to unseen identities. The proposed framework provides a practical, high-performance solution for modern face recognition, addressing key limitations in existing systems while ensuring adaptability to diverse and dynamic environments.