A multi-stage machine learning framework for stepwise prediction of tuberculosis treatment outcomes: Integrating gradient boosted decision trees and feature-level analysis for clinical decision support

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Tuberculosis (TB) remains a global health crisis, with multidrug-resistant (MDR-TB) and extensively drug-resistant (XDR-TB) strains posing significant challenges to treatment. With the increasing availability of clinical and diagnostic data, artificial intelligence methods offer significant potential to transform treatment strategies and improve patient outcomes. In this study, we leveraged the comprehensive TB Portal database, which includes clinical, radiological, demographic, and genomic data from 15,997 patients across high-burden countries, to develop a machine learning model based on gradient-boosted decision trees for predicting tuberculosis treatment outcomes (e.g., success or failure). Using the open-source XGBoost library, our model categorises features into four temporally defined diagnostic stages, pre-treatment, microbiological, post-imaging, and treatment, aligning with the typical clinical workflow to support real-time decision-making. This stepwise framework enables the model to progressively incorporate available data while maintaining robust predictive performance, even in the presence of missing values typical of real-world healthcare settings. The model achieved high predictive accuracy (AUC-ROC: 0.96, F1-score: 0.94), with key predictors including age of onset, drug resistance, and treatment adherence. Regional analysis highlighted variability in performance, underscoring the potential for localised model adaptation. By accommodating missing data at various diagnostic stages, our model provides actionable insights for personalised TB treatment strategies and supports clinical decision-making in diverse and resource-constrained contexts.

Article activity feed