Machine Learning-Based Prediction of Ultrasound-Detected Hepatic Steatosis Within the Metabolic Dysfunction-Associated Steatotic Liver Disease Spectrum Using Routine Clinical and Biochemical Parameters
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background/Objectives: Metabolic dysfunction-associated steatotic liver disease (MASLD) is now the leading cause of chronic liver disease globally, mirroring the increasing prevalence of obesity, insulin resistance, and type 2 diabetes. Early detection of hepatic steatosis is vital for cardiometabolic risk assessment; however, conventional imaging is costly and impractical for population screening. This study aimed to develop interpretable machine-learning models to predict ultrasound-detected hepatic steatosis within the MASLD spectrum using routinely available clinical and biochemical data. Methods: We analyzed data from 644 adults, 50% of whom had ultrasound-detected hepatic steatosis. Preprocessing, imputation, and feature selection were implemented within a single scikit-learn pipeline to avoid information leakage. An Elastic Net-regularized logistic regression identified the top 20 predictors, which were subsequently used across nine supervised machine learning (ML) classifiers. Model performance was evaluated via repeated stratified 5-fold cross-validation (25 resamples) using accuracy, F1 score, sensitivity, specificity, Youden’s J, balanced accuracy, and Area Under the Receiver Operating Characteristic Curve (AUROC). Interpretability was assessed using SHapley Additive exPlanations (SHAP). Results: Participants with ultrasound-detected hepatic steatosis exhibited greater adiposity, insulin resistance, and dyslipidemia compared with controls [p < 0.05 for body mass index (BMI), waist circumference, glucose, glycated hemoglobin (HbA1c), triglycerides]. Elastic Net selection highlighted Weight, Ponderal Index, Fibrosis-4 Index (FIB-4), blood urea nitrogen (BUN)/Creatinine ratio, Aspartate Aminotransferase to Platelet Ratio Index (APRI), and Visceral Adiposity Index as the strongest predictors. Logistic Regression and Gradient Boosting achieved the best performance (accuracy = 0.65 ± 0.03; AUROC = 0.71 ± 0.04; balanced accuracy = 0.66 ± 0.06), outperforming rule-based indices such as Fatty Liver Index (FLI) and Hepatic Steatosis Index (HSI) reported in the literature. SHAP analysis confirmed clinically coherent feature effects, with higher anthropometric and hepatic injury indices increasing the predicted probability of ultrasound-detected hepatic steatosis. Conclusions: Routinely available clinical and biochemical parameters can predict hepatic steatosis with moderate accuracy using transparent, interpretable ML models. Logistic Regression and Gradient Boosting provided best discrimination and robust internal performance, offering a pragmatic, low-cost approach for early identification of ultrasound-detected hepatic steatosis within the MASLD spectrum in primary and metabolic care settings.