Comparative Evaluation of Classification and Regression Algorithms for Chronic Kidney Disease Assessment Using Clinical and Laboratory Features

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Chronic kidney disease (CKD) affects millions globally, with early detection critical for preventing progression to end-stage renal disease. This study evaluated seven machine learning algorithms for CKD classification and glomerular filtration rate (GFR) prediction using clinical and laboratory data from 400 patients. For binary CKD classification, Logistic Regression and Random Forest both achieved exceptional 98.75% test accuracy, with Random Forest demonstrating superior cross-validation stability (±0.99% vs ±1.59%). For continuous GFR prediction, Random Forest substantially outperformed competitors with test R² of 0.914, RMSE of 10.20 mL/min/1.73m², and MAE of 4.37 mL/min/1.73m², representing clinically meaningful precision across the full physiological spectrum. Ridge Regression achieved only moderate performance (R² = 0.514, MAE = 18.89) with severe heteroscedasticity, while Support Vector Regression performed poorly with catastrophic errors at high GFR values. Feature correlation analysis revealed expected physiological relationships, with hemoglobin-packed cell volume showing strong positive correlation (r ≈ 0.85) and serum creatinine-hemoglobin showing negative correlation (r ≈ -0.35). The results establish Random Forest as optimal for both tasks, substantially exceeding standard clinical GFR estimation equations and demonstrating clear potential for deployment in automated screening and risk stratification systems.

Article activity feed