Machine Learning-Based Forecasting of Tuberculosis Incidence in Taiwan: A Comprehensive Comparison of Traditional and Deep Learning Approaches with Projections to 2035

Mei-Mei Kuan¹

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Tuberculosis (TB) remains a significant public health challenge globally, with 10.8 million incident cases in 2023. Accurate forecasting is crucial for resource allocation and evaluating progress toward WHO End TB Strategy targets. This study developed and validated multiple machine learning models to forecast TB cases in Taiwan through 2035. Methods We analyzed 17 years of monthly TB surveillance data (January 2008–July 2025, n = 206 observations) from Taiwan's national electronic TB register. Five modeling approaches were systematically evaluated: Random Forest, XGBoost, LightGBM, ensemble methods (including a novel 70% XGBoost + 30% LightGBM hybrid), and hybrid LSTM-CNN deep learning architectures. Models incorporated temporal features, autoregressive lags (1, 3, 6 months), rolling averages, and stratified demographic data (age, gender, migration status). Two age stratification schemes were compared: 7 groups (0–14, 15–24, 25–34, 35–44, 45–54, 55–64, ≥ 65 years) versus 4 groups (0–24, 25–44, 45–64, ≥ 65 years). Performance was assessed using expanding-window time-series cross-validation over 36 months (August 2022–July 2025) with metrics including R², RMSE, MAPE, and directional accuracy (Hit Rate). Comprehensive sensitivity analyses evaluated forecast robustness. Scenario analyses explored intervention impacts on projected incidence. Results XGBoost with 7 age groups demonstrated superior performance (R²=0.705, RMSE = 60.2, MAPE = 21.7%, Hit Rate = 97.2%), followed by LightGBM (R²=0.698, RMSE = 61.1, MAPE = 22.0%, Hit Rate = 97.2%) and ensemble methods (R²=0.690, RMSE = 61.8, MAPE = 22.2%, Hit Rate = 97.2%). The LSTM-CNN model achieved competitive results with 7 age groups (R²=0.682, RMSE = 63.4, MAPE = 22.8%, Hit Rate = 94.4%) but performance degraded with simplified 4-group stratification. The hybrid ensemble (70% XGBoost + 30% LightGBM) forecasts Taiwan's TB incidence at 14.2 per 100,000 population in 2030 (95% CI: 12.4–16.0) and 14.6 per 100,000 in 2035 (95% CI: 12.7–16.5), representing approximately 3,247 annual cases. This reflects a 50% decline from 2023 baseline (28 per 100,000) but falls short of WHO End TB Strategy targets (< 9 per 100,000 by 2030, < 4.5 per 100,000 by 2035). Scenario analyses indicate that a 30% case reduction through enhanced interventions could achieve 9.9 per 100,000 by 2035. Sensitivity analyses confirmed forecast robustness with < 4% variation across model configurations. Conclusions Machine learning approaches, particularly gradient boosting methods (XGBoost, LightGBM) and their hybrids, provide accurate and robust TB forecasting for Taiwan. The projected trajectory suggests successful maintenance of low TB burden but insufficient progress toward elimination goals under current conditions. Achieving WHO 2030 and 2035 targets requires intensified interventions including expanded preventive therapy, enhanced active case finding, and systematic screening of high-risk populations. This validated forecasting pipeline can be institutionalized for routine surveillance, policy planning, and intervention evaluation.

Version published to 10.21203/rs.3.rs-9223330/v1 on Research Square
Apr 1, 2026

Perinatal Mortality Prediction and Risk Factor Identification Using Machine Learning on Recent Sub-Saharan African DHS Data Affiliations

This article has 8 authors:
1. Tadele Chekol Maru
2. Andualem Enyew
3. Makda Fekadie Tewelgne
4. Eliyas Addisu Taye
5. Agerie Mengistie Zeleke
6. Belayneh Jejaw Abate
7. Deresse Abebe Gebrehana
8. Azanaw Amare Muche
This article has no evaluationsLatest version Mar 30, 2026
Explainable Machine Learning Model for Predicting Early Neurological Deterioration in Patients with Acute Ischemic Stroke

This article has 3 authors:
1. Tingting Huang
2. Shoucai Zhao
3. Kai Wang
This article has no evaluationsLatest version Apr 1, 2026
Development and Validation of a Machine Learning Model for Hepatitis C Virus Exposure: A Demographic Screening Approach for the US Population

This article has 5 authors:
1. Dorian G Ding
2. Taoyi Chen
3. Yu Sheng
4. Jeffrey S.H. Lin
5. Ye Yuan
This article has no evaluationsLatest version Apr 15, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Perinatal Mortality Prediction and Risk Factor Identification Using Machine Learning on Recent Sub-Saharan African DHS Data Affiliations

Explainable Machine Learning Model for Predicting Early Neurological Deterioration in Patients with Acute Ischemic Stroke

Development and Validation of a Machine Learning Model for Hepatitis C Virus Exposure: A Demographic Screening Approach for the US Population