Construction and Validation of an Interpretable Machine Learning Model with SHAP for Identifying Infectious Diseases in Fever of Unknown Origin

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background : Diagnosis and management of fever of unknown origin (FUO) continue to represent a major challenge in clinical practice. Categorizing FUO etiologies into infectious and non-infectious disorders provides a rationale for clinicians to implement a stepwise, stratified screening approach and inform decisions regarding antibiotic administration. This study aimed to develop and validate a diagnostic model for effectively distinguishing infectious from non-infectious etiologies in adult patients with FUO. Methods : This retrospective study was conducted at a tertiary hospital in North China. A total of 630 adult FUO patients were initially enrolled, and 484 patients were finally included for model development and validation. The Boruta algorithm and LASSO regression were applied for feature selection to identify key predictive variables from demographic characteristics, medical histories, and laboratory parameters. Six machine learning models were developed and evaluated using nested 10-fold cross-validation in the training cohort. Model performance was assessed using multiple metrics, including the area under the receiver operating characteristic curve, precision-recall curve, calibration, and clinical utility. Model validation was performed in an independent test cohort. Results : Significant predictors of infectious etiologies included procalcitonin, duration of fever, hematocrit, platelet-to-lymphocyte ratio (PLR), platelet count, erythrocyte sedimentation rate, hemoglobin, thrombin time, and smoking status. The XGBoost model achieved the optimal performance, with an AUC of 0.875 in the validation set and 0.893 in the test set, indicating favorable discriminative ability and clinical applicability. Shapley Additive exPlanations analysis enhanced model interpretability. Notably, PLR also exhibited predictive value for identifying the etiologies of FUO. Conclusions : We constructed a reliable XGBoost model based on a single-center cohort and interpretable clinical features to predict the risk of infectious etiologies in adult FUO patients. This model may provide clinicians with valuable guidance for diagnostic strategies, rational antibiotic use, and workflow optimization.

Article activity feed