Explainable machine learning model incorporating urinary heavy metals to predict nonalcoholic fatty liver disease
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objectives: This study aimed to develop and validate an explainable machine learning (ML) model to predict NAFLD based on urinary heavy metals and phenotypic indices. Methods: Data were drawn from the NHANES 2017-2020. NAFLD was defined as a controlled attenuation parameter (CAP)≥274 dB/m. Urinary heavy metals were quantified by inductively coupled plasma mass spectrometry and normalized to urinary creatinine to account for dilution. Four ML algorithms (LightGBM, NNET, SVM, and XGBoost) were implemented. The dataset was split into training (60%) and validation (40%) sets. Results: Among 1,213 adults, 512 were classified with NAFLD and 701 as controls. XGBoost outperformed others, achieving superior performance (AUC=0.7983; Brier score=0.1804). Feature importance was assessed using SHapley Additive exPlanations (SHAP), identifying a minimal subset of 10 features that preserved model performance. The strongest predictors were: body roundness index, triglyceride, diabetes mellitus, sex, age, and urinary concentrations of cadmium, cesium, barium, lead, and tungsten. Both global and local SHAP interpretations validated these features' contributions. The optimized XGBoost model was deployed as a web application (https://wxqdepression.shinyapps.io/nafldapp/). Conclusions: XGBoost demonstrated superior performance in predicting NAFLD using a streamlined set of urinary heavy metals and phenotypic indicators. SHAP-based interpretability confirmed the relevance of this minimal feature set.