Evaluating algorithm Fairness in Predicting Health Service Use and Unmet Need Across Socioeconomic and Caste Subgroups: Evidence from Longitudinal Ageing Study in India
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Introduction
Persistent socioeconomic and caste inequalities in India drive disparities in healthcare access. Machine learning (ML) models offer promise for forecasting service use and unmet needs, but may perpetuate algorithmic bias against disadvantaged groups. We evaluated both performance and fairness of several ML algorithms across diverse caste and socioeconomic subgroups.
Methods
We used nationally representative data from India to develop machine learning models predicting outpatient care, hospitalization, and unmet healthcare need among older adults. We trained logistic regression, random forest, XGBoost, and LightGBM models using demographic, social, and health-related predictors. Synthetic Minority Oversampling Technique (SMOTE) was applied to address class imbalance. We assessed model performance using AUROC and evaluated fairness across caste and income subgroups. Fairness strategies included removing sensitive features (neutral models) and training stratified models within subgroups. We used SHapley Additive exPlanations (SHAP) to identify the most influential predictors across outcomes.
Results
Among 55,962 older adults in India, 53.4% had at least one outpatient visit, 6.5% were hospitalized, and 7.9% reported unmet healthcare needs. Model performance varied across outcomes and groups. The best-performing model (LightGBM) achieved AUROCs of 0.78 for unmet need, 0.76 for outpatient care, and 0.70 for hospitalization. Predictive accuracy was higher in the lowest socioeconomic group (MPCE 1, AUROC = 0.79) compared to the highest (MPCE 5, AUROC = 0.75). Removing sensitive predictors such as caste or income had minimal impact (change in AUROC <0.02), and subgroup-specific models led to mixed results, with only marginal improvement for Scheduled Castes (AUROC from 0.78 to 0.80). Including social and health determinants substantially improved model performance (e.g., hospitalization AUROC increased from 0.57 to 0.70). Top predictors included self-rated health, region, grip strength, and socioeconomic status. Balancing techniques like SMOTE did not meaningfully enhance performance.
Conclusions
Machine learning models can effectively predict healthcare use and unmet needs among older adults in India. Incorporating social and health determinants improves model accuracy, but eliminating bias requires structural changes beyond technical adjustments.
Fairness-aware model development and deployment are essential to ensure predictive tools contribute to more equitable healthcare systems.
What is already known on this topic
-
Machine learning (ML) has shown promise in predicting healthcare use and unmet need, particularly in high-income settings.
-
Structural inequalities such as caste and income may influence healthcare access, but few ML studies have evaluated how these social factors affect model performance.
-
Fairness concerns in ML are increasingly recognized, yet methods to assess or address them in low- and middle-income country (LMIC) settings remain limited.
What this study adds
-
This study evaluates ML model performance across caste and income subgroups using nationally representative data from older adults in India.
-
It shows that model accuracy varies by subgroup, with better performance among Scheduled Castes and lower-income groups for predicting unmet need.
-
Fairness interventions such as removing sensitive features or training stratified models offer limited benefit and do not fully resolve performance disparities.
-
SHAP analysis identifies social and health determinants—especially self-rated health, caste, region, and income—as key drivers of predictions.
How this study might affect research, practice or policy
-
Encourages routine subgroup evaluation to ensure ML models do not exacerbate existing health inequities.
-
Challenges the assumption that removing sensitive variables like caste or income improves fairness, emphasizing the need to address structural drivers directly.
-
Supports the integration of social determinants into model development to enhance equity, transparency, and relevance in public health applications.