Simplifying the Diagnosis of Tuberculous Pleural Effusion: A Machine Learning Analysis of ADA and Lymphocyte Percentage in 1134 Patients
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Background: Diagnosing tuberculous pleural effusion (TPE) is often complicated by overlapping features with other causes of pleural effusion. Adenosine deaminase (ADA) and lymphocyte percentage (LYM%) are widely used biomarkers, but their isolated diagnostic value remains limited. Methods We retrospectively enrolled 1134 patients with confirmed pleural effusion (615 TPE, 519 non-TPE) from two Chinese hospitals between 2021 and 2025. Nine pleural fluid parameters were analyzed. The dataset was divided into training (70%), validation (15%), and test (15%) sets. We developed four machine learning (ML) models—logistic regression (LR), random forest (RF), Light Gradient Boosting Machine (LightGBM), and support vector machine (SVM)—and compared their diagnostic performance to logistic models based on ADA alone, LYM% alone, and their combination. The DeLong test was used to compare AUCs. Results All pleural fluid parameters, including red blood cells, significantly differed between the TPE and non-TPE groups (p < 0.05). The RF model achieved the highest AUC (0.946), followed by LightGBM (0.945), SVM (0.945), and LR (0.934). ADA + LYM% (AUC = 0.928) outperformed ADA alone (0.815) and LYM% alone (0.905), and showed no significant difference from the full-feature RF model (p = 0.181). Both ADA and LYM% were strong positive predictors in all models. Conclusions A minimal logistic model based on ADA and LYM% demonstrates excellent diagnostic performance for TPE, comparable to more complex machine learning models. This simple and interpretable approach is well-suited for routine clinical application. Trial registration Not applicable. This retrospective diagnostic study was not registered as a clinical trial.