Early prediction of severity progression in patients with chronic kidney disease: A Machine Learning Predictive Modelling analysis with retrospective data of a tertiary care hospital

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Chronic Kidney Disease (CKD) represents a growing health burden, particularly in low- and middle-income countries. Its progression to end-stage renal disease (ESRD) necessitates resource-intensive interventions. The inappropriate activation of the complement system, particularly Complement 3 (C3) and Complement 4 (C4), has been implicated in renal injury. While these markers are biologically relevant, their predictive value for CKD severity has not been adequately explored. This study aimed to assess the potential of serum C3 and C4 in predicting CKD severity using machine learning (ML) models. Methods A retrospective dataset comprising 2,279 adults (> 17 years) was extracted from the laboratory records of AIIMS Bhubaneswar. CKD severity was classified using the 2021 CKD-EPI Creatinine formula, with Stage G3b and above defined as severe CKD. Of these, 1,331 complete records (with C3, C4, and creatinine) formed the internal dataset, while two external datasets were used for validation—one with clinically confirmed CKD and the other age- and sex-matched controls. Predictive models were developed using five ML algorithms: Random Forest (RDF), XGBoost (XGB), Gradient Boosting (GB), Decision Tree (DT), and Artificial Neural Networks (ANN). Model performance was evaluated using accuracy, F1 score, R², diagnostic odds ratio (DOR), and other metrics. Results Recursive Feature Elimination identified age, C3, and the C3/C4 ratio as the most influential predictors. Among the models, RDF performed best (F1: 0.984, Accuracy: 0.991, DOR: 9348, R²: 0.954). External validation confirmed its high diagnostic power (DOR: 494 and 465 for CKD and control datasets, respectively). A web-based tool was developed to aid clinicians in estimating CKD severity using age, C3, and C4 values. Conclusion This study demonstrates that serum complements C3 and C4 can serve as early predictive biomarkers for CKD severity when interpreted via machine learning models. The RDF-based prediction pipeline offers a clinically relevant, non-invasive tool for stratifying CKD patients, potentially reducing the burden of late-stage interventions. Further prospective studies are warranted to validate these findings longitudinally.

Article activity feed