Correcting Algorithmic Bias in Machine Learning Prediction of Healthcare utilization in India
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective
This study investigates how historical disparities in healthcare access influence machine learning (ML) predictions of healthcare utilization among older adults in India through algorithmic bias. We examine the extent to which standard ML models underestimate utilization in disadvantaged populations and quantify the resulting distortion in national-level cost projections.
Methods
Using data from 55,698 respondents in the Longitudinal Ageing Study in India (LASI), we trained two sets of ML models to predict outpatient and inpatient utilization: one on the full population (Model 1), and another on a subsample with met healthcare needs (Model 2). Gradient Boosting was selected as the best-performing algorithm. To interpret model predictions and identify key drivers of healthcare utilization, we applied SHapley Additive exPlanations (SHAP). We compared model outputs across socioeconomic subgroups and extrapolated predicted utilization to national population estimates using WHO-CHOICE unit costs.
Findings
Model 1 consistently underestimated healthcare utilization relative to Model 2, particularly among lower-income and caste-identified groups. Overall, outpatient and inpatient predictions from Model 2 were 8.92 (95% CI: 8.87–8.99)% and 9.59 (9.28–9.85)% higher, respectively. Nationally, this translated to an underestimation of I$390.7 (391.2–391.5) million in outpatient care and I$88.4 (86.2–90.1) million in inpatient care. The largest gaps were concentrated in the poorest and most marginalized subgroups. The SHAP analysis suggests that self-rated health (SRH), economic status (MPCE), and chronic conditions are consistently influential in predicting outpatient and inpatient visits, with some shifts in feature importance between models.
Conclusion
Machine learning models trained on unadjusted population data lead to algorithmic bias and risk perpetuating structural inequities by underrepresenting unmet need. Models based on fulfilled care scenarios yield more equitable and accurate projections.