Interpretable Machine Learning for Life Expectancy Prediction: A Comparative Study of Linear Regression, Decision Tree, and Random Forest
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Life expectancy is a fundamental indicator of population health and socio-economic well-being, yet accurately forecasting it remains challenging due to the interplay of demographic, environmental, and healthcare factors. This study evaluates three machine learning models—Linear Regression (LR), Re- gression Decision Tree (RDT), and Random Forest (RF), using a real-world da- taset drawn from World Health Organization (WHO) and United Nations (UN) sources. After extensive preprocessing to address missing values and inconsist- encies, each model’s performance was assessed with R2, Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). Results show that RF achieves the highest predictive accuracy (R2 = 0.9423), significantly outperforming LR and RDT. Interpretability was prioritized through p-values for LR and feature- importance metrics for the tree-based models, revealing immunization rates (diphtheria, measles) and demographic attributes (HIV/AIDS, adult mortality) as critical drivers of life-expectancy predictions. These insights underscore the syn- ergy between ensemble methods and transparency in addressing public-health challenges. Future research should explore advanced imputation strategies, alter- native algorithms (e.g., neural networks), and updated data to further refine pre- dictive accuracy and support evidence-based policymaking in global health con- texts.