Predicting Iron Deficiencies Using Routine Complete Blood Cell Count Parameters: A Machine Learning Approach and Evaluation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background/Objectives: Iron deficiency remains a prevalent condition, needing specific laboratory tests for diagnosis. This study aimed to evaluate whether routine complete blood cell count (CBC) parameters can be used within a machine learning framework to predict iron deficiency, potentially optimizing laboratory test utilization. Methods: A ret-rospective dataset of outpatients (2023–2026) undergoing both CBC and iron testing was analyzed. Iron deficiency was defined using sex-specific thresholds for ferritin and trans-ferrin saturation. After cleaning data and excluding incomplete records, demographic variables and CBC indices were tested as potential predictors. The dataset was split into training and test sets with stratified sampling. Multiple supervised machine learning models, including logistic regression, decision tree, random forest, XGBoost, support vec-tor machine, k-nearest neighbors, and Naive Bayes, were trained. Hyperparameter tuning and model selection were performed using repeated stratified 10-fold cross-validation, op-timizing the area under the curve (AUC). Model performance was assessed by AUC, sen-sitivity, and specificity, and validated on an independent test set. Results: All models demonstrated predictive capability using CBC parameters alone. Ensemble methods, es-pecially random forest and XGBoost, reached the best performance (AUC values of 0.80–0.87 for ferritin and 0.85–0.96 for transferrin saturation). Sensitivity and specificity were balanced, supporting clinical screening applicability. Results were maintained across validation and confirmed in the test set. Prediction of transferrin saturation showed slightly higher accuracy than ferritin. Feature importance analysis identified MCV, MCH, and RDW as key predictors. Conclusions: CBC-based machine learning models can relia-bly identify subjects with iron deficiency, supporting subsequent, more targeted analyses.