Search AI: a Machine Learning algorithm for chronic kidney disease risk detection using eight readily available clinical features

Julian Martinez
Natalia Castano-Villegas
Alejandra Perez
Daniel Jimenez
Jose Zea
Isabella Llano
Diego Caro
Jose Javier Arango
Walberto Buelvas
Victor Espriella
Ana María Llerena
William Castro

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Chronic kidney disease (CKD) is a leading global cause of morbidity and mortality, particularly in low- and middle-income countries (LMIC) where access to specialized laboratory tests is limited. Early detection is essential but often delayed due to reliance on serum creatinine-based estimated glomerular filtration rate (eGFR). Artificial intelligence (AI) offers opportunities for simple, sensitive screening models using routinely available variables. Methods: We trained and tested a low-cost machine learning algorithm in a multicenter Latin American dataset of 203,067 anonymized records to identify patients at risk of CKD, defined as an eGFR <60 mL/min/1.73m² (CKD-EPI 2021). Eight routinely available, non-invasive variables were used: age, sex, systolic and diastolic blood pressure, body mass index, hypertension, presence of type 2 diabetes (T2D), and diabetes duration (T2DD). To address the imbalance between CKD-positive and CKD-negative cases, oversampling techniques were applied before splitting the dataset into training (70%), validation (12%), and testing (18%). Using the Arkangel AutoML platform, 424 candidate models were generated, including decision trees, random forests, support vector machines, XGBoost, and deep neural networks. Models were prioritized based on predefined criteria: sensitivity >90%, followed by AUC, precision, specificity, and F1 score. Results: The final model was a decision tree trained in a non-stratified sample with the SMOTE augmentation technique. Sensitivity was 90.2%, specificity 92.7%, precision (PPV) 89%, and AUC 91.4%. Binary regression demonstrated the statistical relevance of all the model’s features in predicting CKD risk in our sample. SHAP analysis identified age and diabetes duration as the most influential features in the final ML model. Conclusions : A decision tree model trained with eight routine clinical variables accurately identified individuals at risk of CKD, achieving high sensitivity and balanced performance without requiring specialized tests. This approach is feasible for large-scale screening in low-resource settings and can be integrated into electronic health records to prioritize confirmatory diagnostics and timely care. It also represents one of the first approximations to CKD diagnosis using ML models trained exclusively on Latin American data.

Version published to 10.21203/rs.3.rs-7888843/v1 on Research Square
Nov 13, 2025

Heart Disease Detection with Machine Learning Algorithms

This article has 2 authors:
1. Fatemeh Hosseinabadi
2. Seyedhassan Sharifi
This article has no evaluationsLatest version Jan 6, 2026
Comparative Evaluation of Classification and Regression Algorithms for Chronic Kidney Disease Assessment Using Clinical and Laboratory Features

This article has 4 authors:
1. Rohit Rohit
2. Priyanka Priyanka
3. Kavya Mishra
4. Jaya Kaushiki Mishra
This article has no evaluationsLatest version Dec 30, 2025
Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease

This article has 16 authors:
1. Hao Liu
2. Meijun Liu
3. Xinmiao Guan
4. Feng Cao
5. Changhao Liang
6. Zhongwen Qi
7. Jiaqi Hui
8. Junnan Zhao
9. Jingli Xing
10. Jianguo Zhou
11. Dong Zhang
12. Lei Liu
13. Xiaoliang Hao
14. Minjing Luo
15. Fengqin Xu
16. Yutong Fei
This article has no evaluationsLatest version Jan 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Heart Disease Detection with Machine Learning Algorithms

Comparative Evaluation of Classification and Regression Algorithms for Chronic Kidney Disease Assessment Using Clinical and Laboratory Features

Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease