Expanding Tuberculosis Drug Resistance Prediction beyond binary: Deep Learning for Minimum Inhibitory Concentration prediction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Tuberculosis (TB), caused by Mycobacterium tuberculosis , remains a major global health concern, with case numbers increasing since 2021. In 2024, over 10 million new cases were reported, along with 1.1 million deaths 1 . At the same time, the widespread adoption of whole genome sequencing (WGS) has made it possible to predict drug resistance from genetic data, allowing for faster diagnostic workflows. Several tools have been developed to classify isolates as "resistant" or "susceptible". However, drug resistance is not always binary and often exists along a continuum. Cryptic resistance is one example, where phenotypic resistance occurs in the absence of known resistance mutations. The CRyPTIC dataset, which includes over 12,000 isolates and MIC (minimum inhibitory concentration) measurements across 12 anti-TB drugs, provides a valuable resource for moving beyond binary classification. MIC values quantify the lowest concentration of a drug needed to inhibit bacterial growth, offering a more detailed picture of drug susceptibility. In this study, we build models that extend binary classification to directly predict MIC levels from genomic features. To handle the severe class imbalance in completeness across MIC distributions, especially for newer or second-line drugs, we use oversampling and label-aware training techniques. We compare two modelling approaches: XGBoost, which is well-suited to structured data, and convolutional neural networks (CNNs), which can capture spatial and hierarchical relationships within genomic inputs. XGBoost demonstrated more consistent performance in the presence of imbalance, while CNNs achieved higher resolution when the MIC classes were more evenly distributed. Feature importance analysis revealed that some variants previously thought to cause resistance were linked to lower MIC values, suggesting they may only contribute to low drug resistance instead, where the isolate can be killed with higher doage of the same drug. These insights open the door to more tailored treatment strategies, including the use of higher doses of first-line drugs, which could reduce toxicity, improve patient adherence, and slow the emergence of resistance to newer therapies.