Evaluating algorithm Fairness in Predicting Health Service Use and Unmet Need Across Socioeconomic and Caste Subgroups: Evidence from Longitudinal Ageing Study in India

John Tayu Lee
Vincent Cheng-Sheng Li
Toby Kai-Bo Shen
Valerie Tzu Ning Liu
Sheng Hui Hsu
Tzu-Pin Lu
Arokiasamy Perianayagam
Rifat Atun

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Introduction

Persistent socioeconomic and caste inequalities in India drive disparities in healthcare access. Machine learning (ML) models offer promise for forecasting service use and unmet needs, but may perpetuate algorithmic bias against disadvantaged groups. We evaluated both performance and fairness of several ML algorithms across diverse caste and socioeconomic subgroups.

Methods

We used nationally representative data from India to develop machine learning models predicting outpatient care, hospitalization, and unmet healthcare need among older adults. We trained logistic regression, random forest, XGBoost, and LightGBM models using demographic, social, and health-related predictors. Synthetic Minority Oversampling Technique (SMOTE) was applied to address class imbalance. We assessed model performance using AUROC and evaluated fairness across caste and income subgroups. Fairness strategies included removing sensitive features (neutral models) and training stratified models within subgroups. We used SHapley Additive exPlanations (SHAP) to identify the most influential predictors across outcomes.

Results

Among 55,962 older adults in India, 53.4% had at least one outpatient visit, 6.5% were hospitalized, and 7.9% reported unmet healthcare needs. Model performance varied across outcomes and groups. The best-performing model (LightGBM) achieved AUROCs of 0.78 for unmet need, 0.76 for outpatient care, and 0.70 for hospitalization. Predictive accuracy was higher in the lowest socioeconomic group (MPCE 1, AUROC = 0.79) compared to the highest (MPCE 5, AUROC = 0.75). Removing sensitive predictors such as caste or income had minimal impact (change in AUROC <0.02), and subgroup-specific models led to mixed results, with only marginal improvement for Scheduled Castes (AUROC from 0.78 to 0.80). Including social and health determinants substantially improved model performance (e.g., hospitalization AUROC increased from 0.57 to 0.70). Top predictors included self-rated health, region, grip strength, and socioeconomic status. Balancing techniques like SMOTE did not meaningfully enhance performance.

Conclusions

Machine learning models can effectively predict healthcare use and unmet needs among older adults in India. Incorporating social and health determinants improves model accuracy, but eliminating bias requires structural changes beyond technical adjustments.

Fairness-aware model development and deployment are essential to ensure predictive tools contribute to more equitable healthcare systems.

What is already known on this topic

Machine learning (ML) has shown promise in predicting healthcare use and unmet need, particularly in high-income settings.
Structural inequalities such as caste and income may influence healthcare access, but few ML studies have evaluated how these social factors affect model performance.
Fairness concerns in ML are increasingly recognized, yet methods to assess or address them in low- and middle-income country (LMIC) settings remain limited.

What this study adds

This study evaluates ML model performance across caste and income subgroups using nationally representative data from older adults in India.
It shows that model accuracy varies by subgroup, with better performance among Scheduled Castes and lower-income groups for predicting unmet need.
Fairness interventions such as removing sensitive features or training stratified models offer limited benefit and do not fully resolve performance disparities.
SHAP analysis identifies social and health determinants—especially self-rated health, caste, region, and income—as key drivers of predictions.

How this study might affect research, practice or policy

Encourages routine subgroup evaluation to ensure ML models do not exacerbate existing health inequities.
Challenges the assumption that removing sensitive variables like caste or income improves fairness, emphasizing the need to address structural drivers directly.
Supports the integration of social determinants into model development to enhance equity, transparency, and relevance in public health applications.

Version published to 10.1101/2025.09.09.25335385 on medRxiv
Sep 9, 2025

Algorithm Fairness in Predicting Unmet Preventive Care: Evidence from 16 European Countries using SHARE

This article has 10 authors:
1. Toby Kai-Bo Shen
2. Vincent Cheng Sheng Li
3. Nick Meng-Huan Chen
4. Jennifer Sheng Hui Hsu
5. Rifat Atun
6. Valerie Tzu Ning Liu
7. Charlotte Wang
8. David Bin-Chia Wu
9. Pin-Chun Yeh
10. John Tayu Lee
This article has no evaluationsLatest version Sep 10, 2025
Correcting Algorithmic Bias in Machine Learning Prediction of Healthcare utilization in India

This article has 8 authors:
1. John Tayu Lee
2. Vincent Cheng-Sheng Li
3. Sheng Hui Hsu
4. Tzu-Pin Lu
5. Charlotte Wang
6. Arokiasamy Perianayagam
7. Kanya Anindya
8. Rifat Atun
This article has no evaluationsLatest version Sep 8, 2025
Predicting Opioid Cost Burden through Integrated PBM and SDOH Modeling: An Explainable AI Framework

This article has 1 author:
1. Mohit Singhal
This article has no evaluationsLatest version Oct 28, 2025

Discuss this preprint

Listed in

Abstract

Introduction

Methods

Results

Conclusions

What is already known on this topic

What this study adds

How this study might affect research, practice or policy

Article activity feed

Related articles

Algorithm Fairness in Predicting Unmet Preventive Care: Evidence from 16 European Countries using SHARE

Correcting Algorithmic Bias in Machine Learning Prediction of Healthcare utilization in India

Predicting Opioid Cost Burden through Integrated PBM and SDOH Modeling: An Explainable AI Framework