Development and Validation of an Interpretable Machine Learning Model to Identify Coexisting Type 2 Diabetes Mellitus in Patients with Metabolic dysfunction-associated fatty liver disease
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Patients with metabolic dysfunction-associated fatty liver disease (MAFLD) have a significantly higher risk of type 2 diabetes mellitus (T2DM), but hepatocentric risk prediction remains underexplored. This study aims to develop and validate an interpretable machine learning model for identifying concomitant T2DM in MAFLD patients. A prospective cohort of 4,472 MAFLD patients (2022–2025) was analyzed, with random allocation to training (n=3,129) and validation (n=1,343) sets. Four machine learning models were compared, with Boruta and LASSO algorithms used for feature selection. Model performance was evaluated using ROC-AUC, PR-AUC, calibration plots, and SHAP analysis for interpretability. XGBoost demonstrated the best performance with a validation ROC-AUC of 0.799 (95% CI: 0.763–0.835). The final model incorporated eight variables: age, triglycerides, controlled attenuation parameter, liver stiffness measurement, ALT, AST, hsCRP, and eGFR. SHAP analysis identified age, triglycerides, and liver stiffness measurement as predominant predictors. Risk stratification partitioned patients into low, intermediate, and high-risk tiers with progressive T2DM prevalence (7.4%, 28.1%, and 42.1%, respectively). This XGBoost-based framework provides a clinically viable tool for early T2DM identification in MAFLD patients, facilitating tailored metabolic intervention. Trial registration: Chinese Clinical Trial Registry (ChiCTR), ChiCTR2200063127, registered on August 31st, 2022