A Machine Learning Model Based on CBC-Derived Parameters to Distinguish Benign from Malignant Lymphoproliferative Disorders

Jing Jing¹
Xiaoyan Hao¹
Yanjun Diao
Xiang Cheng
Xiaoxia Gao
Bin Huang
Yun Yang
Enliang Hu
Yuan Zhao
Jingyuan Jia
Jiayun Liu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective Infectious mononucleosis (IM) and malignant lymphoproliferative disorders often present with similar initial symptoms and signs. The purpose of study is to develop a machine learning-based model to distinguish IM from lymphoid hematologic malignancies using both routine and research parameters derived from complete blood count (CBC) analysis. Methods The multicenter model development and validation study utilized data from three independent institutions. Patients with a final confirmed diagnosis of infectious mononucleosis (IM), acute lymphoblastic leukemia (ALL), or chronic lymphoproliferative disorders (CLPD) were included. A total of 24 routine and 21 report-derived parameters from the complete blood count (CBC) at initial presentation to our institution were collected. Nine candidate biomarkers and five machine learning classifiers were employed to construct predictive models from the training data. The models were validated using an independent test dataset. Model performance was assessed by calculating the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), Matthews correlation coefficient (MCC), and Cohen’s Kappa coefficient. The diagnostic performance of all five models was evaluated using both internal ROC analysis and external validation datasets. Result A total of 114 patients with infectious mononucleosis (IM), 108 with hematologic diseases, and 150 healthy controls were included in the study. In both the IM versus hematologic disease classification model and the IM versus healthy control model, excluding the decision tree (DT), other methods achieved good evaluation indicators, most of them are above 87% in the validation cohorts, suggesting reliable diagnostic capability. In the classification model distinguishing healthy individuals from patients with hematologic diseases, the XGBoost algorithm showed stable and high performance across the training, validation, and test sets. The receiver operating characteristic (ROC) curves confirmed that XGBoost was the most effective model, the area under the curve (AUC) values was 0.995 in the test sets. Conclusion The XGBoost model demonstrated the most satisfactory performance. Machine learning algorithms show promise for clinical implementation, and the proposed model may aid in the early identification of IM and malignant lymphoid disorders with overlapping initial presentations. This provides valuable assistance for clinical doctors to intervene early and improve prognosis.

Version published to 10.21203/rs.3.rs-7283529/v1 on Research Square
Sep 3, 2025

Development of Machine Learning Algorithms for Predicting Vitamin B12 Levels Using Biochemical Analyte Data

This article has 3 authors:
1. Ferhat Demirci
2. Oktay YILDIRIM
3. Pınar AKAN
This article has no evaluationsLatest version Jan 2, 2026
RETRACTED: Development and Validation of a Simplified Machine Learning Model Based on T-SPOT.TB and Routine Clinical Data for the Diagnosis of Tuberculous Pleural Effusion

This article has 5 authors:
1. Shuangyin Yang
2. Kuiliang Yang
3. Lizhi Wang
4. Jie Pu
5. Pu Wang
This article has no evaluationsLatest version Dec 12, 2025
An enhanced explainable thyroid disease diagnosis by leveraging cluster-smote and machine learning models

This article has 4 authors:
1. Usman Suleh
2. Badamasi Alhaji Ahmed
3. Farouk Lawan Gambo
4. Fatima Umar Zambuk
This article has no evaluationsLatest version Jan 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Development of Machine Learning Algorithms for Predicting Vitamin B12 Levels Using Biochemical Analyte Data

RETRACTED: Development and Validation of a Simplified Machine Learning Model Based on T-SPOT.TB and Routine Clinical Data for the Diagnosis of Tuberculous Pleural Effusion

An enhanced explainable thyroid disease diagnosis by leveraging cluster-smote and machine learning models