Internal and External Validation of Machine Learning Algorithms Versus FINDRISC for Incident Type 2 Diabetes: A Transparent, Explainable Benchmark Using SHAP

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Type 2 diabetes mellitus (T2DM) affects almost half a billion people, and the projected cost is $2.25 trillion by 2030; early detection strategies are needed for prevention. However, Finnish Diabetes Risk Score (FINDRISC) enables questionnaire-based risk assessment; its linearity and lack of interpretability limit predictive power and generalizability across diverse populations. Machine learning (ML) models are the next-generation prediction tools, but require rigorous benchmarking, external validation, and explainability to be clinically trusted.

Methods

In the current prospective cohort (n=9,171, 7.1 years follow-up), we compared six supervised ML models, three anomaly detectors, and a stacking ensemble against FINDRISC for T2DM incidence, using harmonized, calibrated pipelines and internal and external validation in US (NHANES) and PIMA Indian populations. External validations included reduced (7- and 3-variable) models, and explainability was assessed with SHAP.

Results

ML models, particularly neural networks and stacking, achieved superior internal discrimination (ROC AUC up to 0.87 vs. FINDRISC 0.70), with stacking ensemble recall of 0.81. In reduced-variable external validations, ML models maintained robust performance (AUCs > 0.76), and strikingly, the isolation forest anomaly detector excelled in US data. Sensitivity analysis demonstrated that without laboratory data, FINDRISC still matches or exceeds ML, thereby preserving its practical role in non-laboratory settings. SHAP analysis consistently identified FBS, BMI, and age as main predictors, promoting interpretability.

Conclusions

Harmonized ML models, when externally validated, substantially improve traditional risk scores for T2DM prediction, particularly when laboratory data are available. Transparent analytics and an open-source online calculator support global clinical deployment. This work substantially advances precision prevention in T2DM through explainable, portable prediction.

Article activity feed