ISGD : A Dataset for Demographically-Aware Facial Analysis and Privacy-First Skincare Recommendation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Facial attribute recognition plays a crucial role in applications ranging from human-computer interaction to personalised digital health. However, the effectiveness of existing systems is often limited by demographic bias in training data and the absence of domain-specific annotations, particularly for nuanced tasks such as skincare and grooming analysis. Large-scale datasets like CelebA are predominantly Western-centric and lack critical attributes including Oily Skin, Wrinkles, and grooming-related characteristics. To address these limitations, we introduce the Indian Skincare and Grooming Dataset (ISGD), a manually curated dataset comprising 30,141 facial images from the Indian subcontinent, annotated across 33 fine-grained binary attributes specifically designed for skincare and grooming analysis. Building upon ISGD, we propose AKRTI, a privacy-first inference pipeline that decouples visual processing from report generation. The system employs a ConvNeXt-Tiny backbone for multi-label facial attribute prediction. Importantly, only the predicted binary attribute vector—never the raw facial image—is passed to a large language model (LLM) to generate a personalised, human-readable skincare and grooming report, thereby preserving user privacy. Experimental results demonstrate that models trained on ISGD significantly outperform those trained on a size-matched subset of CelebA, achieving 94.26% overall accuracy and an F1-score of 0.8851. Furthermore, per-attribute evaluation indicates more consistent and reliable predictions for skincare-critical features such as beard presence, skin condition, and wrinkles. By introducing a demographically representative dataset alongside a privacy-aware framework, this work establishes a robust foundation for equitable and practical AI-driven facial analysis systems in personalised healthcare and wellness. The source code for all experiments and implementations is publicly available at our GitHub repository: https://github.com/HimalRana2610/ISGD. Archived at Zenodo (DOI: https://doi.org/10.5281/zenodo.18837811).