The Effect of Skin Tone Classification on Bias in Bruise Detection
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background AI-based bruise detection tools have shown promise in injury detection but often underperform on darker skin tones. Most fairness evaluations in dermatology AI rely on a single method of skin tone categorization. In biased models, the choice of skin tone scale plays a critical role, not only in grouping individuals, but also in making disparities visible. A well-designed scale should reveal, rather than mask, differences in model performance across skin tones. This study evaluated how different skin tone classification methods affect the accuracy, average precision, and fairness measures in bruise detection. Methods Using a dataset of 11,766 bruise and non-bruise images collected under white and alternate light sources, six skin tone labeling methods were applied: Individual Typology Angle (ITA), Fitzpatrick scale (full and binarized), Monk Skin Tone (MST), and two clustering approaches (chromatic and luminance-based). Chromatic clusters were created using the raw L* (lightness), a* (red-green), and b* (yellow-blue) values, whereas luminance clustering used the L* values only. A YOLOv5 model was re-trained for bruise detection, and model accuracy, average precision, and fairness were assessed on the test set using Demographic Parity Difference (DPD) and Equal Opportunity Difference (EOD) metrics across these six skin tone groupings. Results Performance and fairness varied widely depending on the skin tone classification method used. Granular scales such as the Monk Skin Tone (MST) scale and the chromatic cluster-based scale revealed greater variability in both performance and fairness across the skin tone groups, especially in the mid-to-dark range. In contrast, simpler and broader scales such as luminance-based clustering and binary Fitzpatrick showed more stable trends, but they may have hidden important differences between skin tones. Conclusions Skin tone classification plays a key role in how both performance and fairness are evaluated in bruise detection models. Granular skin tone scales such as MST and chromatic clustering may not show the highest performance, but reveal disparities more clearly, whereas broader scales may mask them despite performing well. Addressing these biases requires careful selection of skin tone grouping methods for evaluation.