Development and Internal Validation of an Interpretable Machine Learning Model for Predicting Atrial Fibrillation in Patients with Diabetic Kidney Disease: A Multicenter Study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Patients with diabetic kidney disease (DKD) have elevated atrial fibrillation (AF) risk, yet population-specific prediction tools are limited. We aimed to develop and internally validate an interpretable machine-learning (ML) model for AF risk in hospitalized DKD. Methods In this retrospective cohort from two hospitals (January 2021 to December 2024), 787 unique DKD admissions were randomly split into training (70%) and test (30%) sets. AF at index admission was ascertained from electrocardiograms, Holter monitoring when available, and ICD-10 codes with physician adjudication. Candidate predictors were routine clinical, laboratory, and echocardiographic variables. Least absolute shrinkage and selection operator (LASSO) selected features in the training set. Seven supervised models were trained; performance was assessed by area under the receiver-operating characteristic curve (AUC), calibration, and decision-curve analysis. SHAP quantified predictor contributions. Results LASSO retained 14 features, including 24-hour urine total protein (24UTP), serum creatinine (SCr), age, and left atrial diameters. In the test set, k-nearest neighbors (KNN) achieved AUC 0.927, accuracy 0.886, sensitivity 0.920, and specificity 0.856; calibration was good and decision curves showed net benefit across common thresholds. Five-fold cross-validation yielded mean AUC 0.90 ± 0.02. SHAP indicated proteinuria burden, renal dysfunction, age, and atrial size as leading contributors. The finalized model was deployed as a secure web calculator using routine inputs. Conclusions An interpretable ML-based model using standard clinical and echocardiographic data showed stable internal performance for AF risk estimation in DKD, with an accompanying web calculator for point-of-care use. Prospective multicenter studies are needed to confirm generalizability and clinical impact.