Internal and External Validation of Machine Learning Algorithms Versus FINDRISC for Incident Type 2 Diabetes: A Transparent, Explainable Benchmark Using SHAP

Parsa Amirian
Mahsa Zarpoosh

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Type 2 diabetes mellitus (T2DM) affects almost half a billion people, and the projected cost is $2.25 trillion by 2030; early detection strategies are needed for prevention. However, Finnish Diabetes Risk Score (FINDRISC) enables questionnaire-based risk assessment; its linearity and lack of interpretability limit predictive power and generalizability across diverse populations. Machine learning (ML) models are the next-generation prediction tools, but require rigorous benchmarking, external validation, and explainability to be clinically trusted.

Methods

In the current prospective cohort (n=9,171, 7.1 years follow-up), we compared six supervised ML models, three anomaly detectors, and a stacking ensemble against FINDRISC for T2DM incidence, using harmonized, calibrated pipelines and internal and external validation in US (NHANES) and PIMA Indian populations. External validations included reduced (7- and 3-variable) models, and explainability was assessed with SHAP.

Results

ML models, particularly neural networks and stacking, achieved superior internal discrimination (ROC AUC up to 0.87 vs. FINDRISC 0.70), with stacking ensemble recall of 0.81. In reduced-variable external validations, ML models maintained robust performance (AUCs > 0.76), and strikingly, the isolation forest anomaly detector excelled in US data. Sensitivity analysis demonstrated that without laboratory data, FINDRISC still matches or exceeds ML, thereby preserving its practical role in non-laboratory settings. SHAP analysis consistently identified FBS, BMI, and age as main predictors, promoting interpretability.

Conclusions

Harmonized ML models, when externally validated, substantially improve traditional risk scores for T2DM prediction, particularly when laboratory data are available. Transparent analytics and an open-source online calculator support global clinical deployment. This work substantially advances precision prevention in T2DM through explainable, portable prediction.

Version published to 10.1101/2025.09.05.25335151 on medRxiv
Sep 7, 2025

Comprehensive, Transparent, and Fair Machine Learning Models for Hypertension Risk Prediction: Benchmarking With Framingham, External Validation, Individual-Level Analysis, and Equitable Clinical Utility

This article has 2 authors:
1. Parsa Amirian
2. Mahsa Zarpoosh
This article has no evaluationsLatest version Sep 5, 2025
Construction and Validation of an Interpretable Machine Learning Model for Predicting Diabetes Risk in COPD Patients

This article has 13 authors:
1. Lingpin Pang
2. Siyan Xu
3. Yingxin Wang
4. Tao Huang
5. Qian Xian
6. Wenjia Lin
7. Haowen Pang
8. Zhirui Chen
9. Bozhi Zhong
10. Hui Miao
11. Hui Chen
12. Xishi Sun
13. Jie Sun
This article has no evaluationsLatest version Aug 19, 2025
DiaHealth: Early Prediction of Type-2 Diabetes with Associated Risk Factors Using Machine Learning and Explainable AI

This article has 5 authors:
1. Marzia Zaman
2. Md. Jobayer Rahman
3. Tabia Tanzin Prama
4. Farhana Farhana
5. Khondaker A. Mamun
This article has no evaluationsLatest version Sep 2, 2025

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Related articles

Comprehensive, Transparent, and Fair Machine Learning Models for Hypertension Risk Prediction: Benchmarking With Framingham, External Validation, Individual-Level Analysis, and Equitable Clinical Utility

Construction and Validation of an Interpretable Machine Learning Model for Predicting Diabetes Risk in COPD Patients

DiaHealth: Early Prediction of Type-2 Diabetes with Associated Risk Factors Using Machine Learning and Explainable AI