Construction and Validation of an Interpretable Machine Learning Model with SHAP for Identifying Infectious Diseases in Fever of Unknown Origin

Fei Li
Xu Zhang
Juan Zhang
Yang Yu
Jie Yang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background : Diagnosis and management of fever of unknown origin (FUO) continue to represent a major challenge in clinical practice. Categorizing FUO etiologies into infectious and non-infectious disorders provides a rationale for clinicians to implement a stepwise, stratified screening approach and inform decisions regarding antibiotic administration. This study aimed to develop and validate a diagnostic model for effectively distinguishing infectious from non-infectious etiologies in adult patients with FUO. Methods : This retrospective study was conducted at a tertiary hospital in North China. A total of 630 adult FUO patients were initially enrolled, and 484 patients were finally included for model development and validation. The Boruta algorithm and LASSO regression were applied for feature selection to identify key predictive variables from demographic characteristics, medical histories, and laboratory parameters. Six machine learning models were developed and evaluated using nested 10-fold cross-validation in the training cohort. Model performance was assessed using multiple metrics, including the area under the receiver operating characteristic curve, precision-recall curve, calibration, and clinical utility. Model validation was performed in an independent test cohort. Results : Significant predictors of infectious etiologies included procalcitonin, duration of fever, hematocrit, platelet-to-lymphocyte ratio (PLR), platelet count, erythrocyte sedimentation rate, hemoglobin, thrombin time, and smoking status. The XGBoost model achieved the optimal performance, with an AUC of 0.875 in the validation set and 0.893 in the test set, indicating favorable discriminative ability and clinical applicability. Shapley Additive exPlanations analysis enhanced model interpretability. Notably, PLR also exhibited predictive value for identifying the etiologies of FUO. Conclusions : We constructed a reliable XGBoost model based on a single-center cohort and interpretable clinical features to predict the risk of infectious etiologies in adult FUO patients. This model may provide clinicians with valuable guidance for diagnostic strategies, rational antibiotic use, and workflow optimization.

Version published to 10.21203/rs.3.rs-8980130/v1 on Research Square
Apr 9, 2026

Development and validation of an interpretable machine learning model for predicting in-hospital mortality in patients with ventricular fibrillation

This article has 9 authors:
1. Chengdi Chen
2. Kaixiang Zhang
3. Tongchun Zhong
4. Haochun Li
5. Zibei Feng
6. Zhijian Guo
7. Zhixiong Yang
8. Shian Huang
9. Lingpin Pang
This article has no evaluationsLatest version Mar 26, 2026
Development and validation a machine learning model based on clinical factors to predict short-term prognosis of ICU intracerebral hemorrhage patients: a retrospective study

This article has 3 authors:
1. Hanbo Liu
2. Weigao Liu
3. Ping Xue
This article has no evaluationsLatest version Mar 25, 2026
Development and Validation of a Machine Learning Model for Hepatitis C Virus Exposure: A Demographic Screening Approach for the US Population

This article has 5 authors:
1. Dorian G Ding
2. Taoyi Chen
3. Yu Sheng
4. Jeffrey S.H. Lin
5. Ye Yuan
This article has no evaluationsLatest version Apr 15, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Development and validation of an interpretable machine learning model for predicting in-hospital mortality in patients with ventricular fibrillation

Development and validation a machine learning model based on clinical factors to predict short-term prognosis of ICU intracerebral hemorrhage patients: a retrospective study

Development and Validation of a Machine Learning Model for Hepatitis C Virus Exposure: A Demographic Screening Approach for the US Population