Household Clustering of High-Risk Contacts in Smear-Positive TB Patient Families: Evidence for Hotspot Households and Risk Stratification in Rural Eastern Cape
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Household contacts of smear-positive tuberculosis (TB) patients face a high risk of infection and disease progression, particularly children and members of large, crowded families. Despite WHO recommendations for systematic screening and preventive therapy (TPT), implementation in high-burden rural settings remains limited. This study aimed to develop a practical framework for prioritizing high-risk families by evaluating demographic predictors, household clustering, and machine learning models. Methods: We analyzed 428 household contacts across 20 families of smear-positive index cases. Screening outcomes were categorized as high or low risk. We performed descriptive statistics, χ2 tests, and logistic regression. Visualization methods, including network diagrams, heatmaps with clustering, and risk ranking bar charts, were used to explore household-level clustering. Machine learning models (logistic regression, random forest, gradient boosting) were trained using age, gender, screening status, and household size, employing 5-fold cross-validation and an 80/20 hold-out set. Results: Of the 428 contacts, 281 (65.7%) were classified as high risk. Age group was significantly associated with high-risk status (χ2=21.4,p< 0.001), with children 0.75). Gradient boosting was the top-performing machine learning model (cross-validated AUROC=0.65±0.03; AUPRC=0.76±0.04), demonstrating good calibration (Brier score=0.21) and net clinical benefit within 0.2–0.6 risk thresholds via decision-curve analysis. Conclusion: TB risk is strongly clustered within families, with large, child-dominated households being most vulnerable. The integrated framework—combining statistical analysis, household visualization, and machine learning—offers a practical, added-value tool for prioritizing families and directing limited resources effectively. These findings reinforce the WHO′s family-centered approach and underscore the importance of integrating clinical governance and community-engaged education into TB control strategies.