Development and Validation of a Simplified Machine Learning Model Based on T-SPOT.TB and Routine Clinical Data for the Diagnosis of Tuberculous Pleural Effusion

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective Diagnosing tuberculous pleural effusion (TPE) remains a significant clinical challenge. This study aimed to develop and validate a simplified, accurate, and interpretable machine learning model for the early diagnosis of TPE, utilizing T-SPOT.TB and routine clinical variables. Methods A total of 486 patients with pleural effusion (PE) were retrospectively enrolled and randomly divided into training and testing sets in a ratio of 8:2. Demographic and laboratory variables were collected, preprocessed, and analyzed. Feature selection was conducted utilizing LASSO regression and the Boruta algorithm. The selected features were employed to construct diagnostic models for TPE using five machine learning (ML) algorithms: logistic regression (LR), random forest (RF), support vector machine (SVM), extreme gradient boosting (XGBoost), and light gradient boosting machine (LGBM). The RF model was interpreted using SHapley Additive exPlanations (SHAP), and a simplified model was developed based on feature importance. The simplified RF model underwent external validation, and its calibration and clinical utility were evaluated through calibration and decision curve analyses. Results The RF model demonstrated superior performance compared to the five machine learning algorithms in differentiating TPE from non-TPE cases. The simplified RF model, utilizing six features, achieved an area under the curve (AUC) of 0.939, an accuracy of 0.887, a sensitivity of 0.862, a specificity of 0.923, and an F1 score of 0.900. External validation further corroborated its diagnostic robustness, yielding an AUC of 0.917 and an F1 score of 0.898. SHAP analysis revealed pleural adenosine deaminase (ADA), blood T-SPOT.TB, and pleural carcinoembryonic antigen (CEA) as the three most significant predictors of TPE. Conclusions This study established and externally validated a simplified random forest model for diagnosing TPE. The model demonstrated high accuracy and strong clinical utility, potentially aiding clinical decision-making in the diagnosis and management of TPE.

Article activity feed