Lack of children in public medical imaging data points to growing age bias in biomedical AI
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background. Artificial intelligence (AI) is transforming healthcare, but its benefits have not been equitably distributed, with children being particularly overlooked. Only 17% of FDA-approved medical AI devices are labeled for pediatric use. We hypothesized that this disparity may be due to a fundamental data gap in pediatric medical imaging. Methods. To test this hypothesis, we performed a systematic review of 180 publicly available medical imaging datasets to assess pediatric data representation. To identify the primary data sources used for methods development, we first surveyed papers from a machine learning imaging conference. Finally, we evaluated the performance of adult-trained chest radiograph models when applied to pediatric populations to quantify potential age-related bias. Results . Our systematic review found that children represent less than 1% of the data in public medical imaging datasets. The majority of machine learning conference papers we surveyed relied on publicly available data for model development. Furthermore, we found that adult-trained chest radiograph models exhibit significant age bias when applied to pediatric populations, with higher false positive rates in younger children. Discussion. This study highlights the urgent need for increased pediatric representation in publicly accessible medical datasets. Our findings suggest that the lack of pediatric data may contribute to the scarcity of AI tools for children and the poor performance of adult-trained models in this population. We provide actionable recommendations for researchers, policymakers, and data curators to address this age equity gap and mitigate the potential harms of AI systems not trained on pediatric patients.