Machine Learning–Based Prediction of Ultrasound-Detected Hepatic Steatosis Within the Metabolic Dysfunction–Associated Steatotic Liver Disease Spectrum Using Routine Clinical and Biochemical Parameters

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background/Objectives: Metabolic dysfunction-associated steatotic liver disease (MASLD) is now the leading cause of chronic liver disease globally, mirroring the in-creasing prevalence of obesity, insulin resistance, and type 2 diabetes. Early detection of hepatic steatosis is vital for cardiometabolic risk assessment; however, conventional imaging is costly and impractical for population screening. This study aimed to devel-op interpretable machine-learning models to predict ultrasound-detected hepatic ste-atosis within the MASLD spectrum using routinely available clinical and biochemical data. Methods: We analyzed data from 644 adults, 50% of whom had ultra-sound-detected hepatic steatosis. Preprocessing, imputation, and feature selection were implemented within a single scikit-learn pipeline to avoid information leakage. An Elastic Net–regularized logistic regression identified the top 20 predictors, which were subsequently used across nine supervised machine learning (ML) classifiers. Model performance was evaluated via repeated stratified 5-fold cross-validation (25 resamples) using accuracy, F1 score, sensitivity, specificity, Youden’s J, balanced accu-racy, and Area Under the Receiver Operating Characteristic Curve (AUROC). Inter-pretability was assessed using SHapley Additive exPlanations (SHAP). Results: Par-ticipants with ultrasound-detected hepatic steatosis exhibited greater adiposity, insu-lin resistance, and dyslipidemia compared with controls [p < 0.05 for body mass index (BMI), waist circumference, glucose, glycated hemoglobin (HbA1c), triglycerides]. Elastic Net selection highlighted Weight, Ponderal Index, Fibrosis-4 Index (FIB-4), blood urea nitrogen (BUN)/Creatinine ratio, Aspartate Aminotransferase to Platelet Ratio Index (APRI), and Visceral Adiposity Index as the strongest predictors. Logistic Regression and Gradient Boosting achieved the best performance (accuracy = 0.65 ± 0.03; AUROC = 0.71 ± 0.04; balanced accuracy = 0.66 ± 0.06), outperforming rule-based indices such as Fatty Liver Index (FLI) and Hepatic Steatosis Index (HSI) reported in the literature. SHAP analysis confirmed clinically coherent feature effects, with higher anthropometric and hepatic injury indices increasing the predicted probability of ul-trasound-detected hepatic steatosis. Conclusions: Routinely available clinical and bi-ochemical parameters can predict hepatic steatosis with moderate accuracy using transparent, interpretable ML models. Logistic Regression and Gradient Boosting pro-vided best discrimination and robust internal performance, offering a pragmatic, low-cost approach for early identification of ultrasound-detected hepatic steatosis within the MASLD spectrum in primary and metabolic care settings.

Article activity feed