Machine Learning–Based Prediction of Ultrasound-Detected Metabolic Dysfunction–Associated Steatotic Liver Disease Using Routine Clinical and Biochemical Parameters
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background/Objectives: Metabolic dysfunction–associated steatotic liver disease (MASLD) is now the leading cause of chronic liver disease globally, mirroring the increasing prevalence of obesity, insulin resistance, and type 2 diabetes. Early detection of hepatic steatosis is vital for cardiometabolic risk assessment; however, conventional imaging is costly and impractical for population screening. This study aimed to develop interpretable machine-learning models to predict ultrasound-detected MASLD using routinely available clinical and biochemical data. Methods: We analyzed data from 644 adults (50% with MASLD on ultrasonography). Preprocessing, imputation, and feature selection were implemented within a single scikit-learn pipeline to avoid information leakage. An Elastic Net–regularized logistic regression identified the top 20 predictors, which were subsequently used across nine supervised machine learning (ML) classifiers. Model performance was evaluated via repeated stratified 5-fold cross-validation (25 resamples) using accuracy, F1 score, sensitivity, specificity, Youden’s J, balanced accuracy, and Area Under the Receiver Operating Characteristic Curve (AUROC). Interpretability was assessed using SHapley Additive exPlanations (SHAP). Results: Participants with MASLD exhibited greater adiposity, insulin resistance, and dyslipidemia compared with controls [p < 0.05 for body mass index (BMI), waist circumference, glucose, HbA1c, triglycerides). Elastic Net selection highlighted Weight, Ponderal Index, Fibrosis-4 Index (FIB-4), blood urea nitrogen (BUN)/Creatinine ratio, Aspartate Aminotransferase to Platelet Ratio Index (APRI), and Visceral Adiposity Index as the strongest predictors. Logistic Regression and Gradient Boosting achieved the best performance (accuracy = 0.65 ± 0.03; AUROC = 0.71 ± 0.04; balanced accuracy = 0.66 ± 0.06), outperforming rule-based indices such as Fatty Liver Index (FLI) and Hepatic Steatosis Index (HSI) reported in the literature. SHAP analysis confirmed clinically coherent feature effects, with higher anthropometric and hepatic injury indices increasing predicted MASLD probability. Conclusions: Routinely available clinical and biochemical parameters can predict hepatic steatosis with moderate accuracy using transparent, interpretable ML models. Logistic Regression and Gradient Boosting provided the best discrimination and generalizability, offering a pragmatic, low-cost approach for early MASLD screening in primary and metabolic care settings.