A Metabolomics-Guided Machine Learning Model for Diagnosis and Differential Diagnosis of Diabetic Kidney Disease: A Dual-Center Study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Early diagnosis and timely intervention are critical for delaying the progression of diabetic kidney disease (DKD) to end-stage renal disease (ESRD). This study integrated metabolomics profiling with machine learning algorithms to comprehensively identify blood-based biomarkers—including 1,5-anhydroglucitol (1,5-AG) and multiple fatty acids—associated with DKD, and to develop predictive models for both diagnosis and differential diagnosis. Methods Clinical data and serum samples were collected from 1,038 patients with DKD, diabetes mellitus, or non-DKD chronic kidney disease (CKD) at the First Affiliated Hospital of Dalian Medical University. Concentrations of fatty acids and 1,5-AG were quantified by HPLC–MS/MS. Candidate biomarkers were screened using regression analyses. Diagnostic models for DKD were developed using four algorithms—binary logistic regression, random forest, decision tree, and naive Bayes—while a logistic regression model was applied to differentiate DKD from non-diabetic CKD. Model performance was assessed by the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. External validation was conducted in an independent cohort of 236 patients (DKD and diabetes without renal insufficiency) from the Second Affiliated Hospital. Results Four biomarkers—C16:0, C18:0, estimated glomerular filtration rate (eGFR), and glucose—were identified for DKD diagnosis. Logistic regression achieved the best performance, with AUCs of 0.920 (training), 0.879 (internal validation), and 0.881 (external validation). For differential diagnosis between DKD and non-diabetic CKD, five biomarkers—1,5-AG, glucose, C18:0, body mass index (BMI), and diastolic blood pressure (DBP)—yielded AUCs of 0.873 (training) and 0.812 (internal validation). Correlation analyses revealed that 1,5-AG was negatively associated with glucose and eGFR, but positively associated with serum creatinine, uric acid, and urea. In contrast, C14:0, C20:0, and C24:1 were positively correlated with glucose and eGFR, but negatively with serum creatinine. Multivariate analysis identified C24:1, C20:0, and 1,5-AG as independent risk factors for DKD progression. Conclusion Fatty acids C24:1 and C20:0, along with 1,5-AG, may independently increase the risk of DKD progression, and renal function appears to influence 1,5-AG levels. Both the diagnostic and differential diagnostic models demonstrated robust predictive performance for DKD in independent cohorts.