The Guideline for Building Fair Multimodal Medical AI with Large Vision-Language Model
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Multimodal medical artificial intelligence (AI) is increasingly recognized as superior to traditional single-modality approaches due to its ability to integrate diverse data sources, which align closely with clinical diagnostic processes. However, the impact of multimodal information interactions on model fairness remains unknown, leading to a critical challenge for equitable AI deployment in healthcare. Here, we extend fairness research to multimodal medical AI and leverage large-scale medical vision-language models (VLMs) to provide guidelines for building fair multimodal AI. Training on large and diverse datasets enables medical VLMs to discern variances across populations, thereby offering a more equitable insight compared to single data sources. Our analysis covers three key medical domains—dermatology, radiology, and ophthalmology—focusing on how patient metadata interacts with medical images to affect model fairness across dimensions such as gender, age, and skin tone. Our findings reveal that the indiscriminate inclusion of all metadata may negatively impact fairness for protected subgroups and show how multimodal AI utilizes demographic information in metadata to influence fairness. In addition, we conducted an in-depth analysis of how clinical attributes affect model performance and fairness, covering more than 20 different attributes in dermatology. Finally, we proposed a fairness-oriented metadata selection strategy using recent advancements in large medical VLMs to guide attribute selection. Remarkably, we found that the fairness correlations computed by the medical VLM closely align with our experimental results, which required over 500 GPU hours, demonstrating a resource-efficient approach to guide multimodal integration. Our work underscores the importance of careful metadata selection in achieving fairness in multimodal medical AI. We anticipate that our analysis will be a starting point for more sophisticated multimodal medical AI models of fairness.