How AI Models Judge Facial Attractiveness: Visualizing Features with CNN and Vision Transformer to Understand Attractiveness Factors

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Facial attractiveness influences social life, and a series of experimental and computational studies have been conducted to provide insights into such outcomes. Computational deep-learning studies have been conducted to predict facial attractiveness and explore its contributing factors. Although convolutional neural networks (CNNs) are the mainstay of these analyses, the Vision Transformer (ViT) model has recently attracted considerable attention. In this study, facial attractiveness prediction models were constructed using CNN-based models or ViT, and the facial regions the models focused on were identified. While the CNN focused on local features, particularly facial features such as eyes, nose, and mouth, the visual geometry group 19 (VGG19), a CNN model with a deep architecture, covers a wider range. In contrast, ViT focuses on global features, such as skin texture and facial contours, suggesting that it may reflect holistic processing based on the relationships between facial parts. In research on understanding facial attractiveness factors using deep learning models, the extracted facial features differ by model architecture, advocating the need for an approach that considers both local and global features based on the research objective.

Article activity feed