Study and implementation of a new machine learning algorithm to predict drug resistance in Mycobacterium tuberculosis complex clinical isolates

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Tuberculosis is a major public health problem, and the diagnosis of multidrug-resistant and extensively drug-resistant tuberculosis is a global health priority. This resistance is mainly caused by mutations in genes coding for drug targets or conversion enzymes, but knowledge of these mutations is incomplete. Whole-genome sequencing (WGS) is an increasingly common approach for rapidly characterizing isolates and identifying mutations predictive of drug resistance. Moreover, it promises to circumvent long delays in results; however, this technique has not accounted for the evolution of resistance. In contrast, machine learning methods have been widely applied to predict the resistance of Mycobacterium tuberculosis (MTB) to a specific drug in a timely manner and even identify resistance markers. Our main objective was to explore statistical learning algorithms to construct a fast and powerful prediction model to predict drug sensitivity and resistance, thus providing an optimal diagnostic tool to aid in clinical decision making. Methods Machine learning approaches were applied to 28,073 (22,458 for training and 5614 for testing) M. tuberculosis isolates that underwent WGS analysis and laboratory drug susceptibility testing (DST) for 10 antituberculosis drugs. The data for these isolates were collected from the National Center for Biotechnology Information (NCBI) database. Boosting models, such as extreme gradient tree (XGBoost), light gradient tree (LightGBM), and a deep neural network model with a new architecture, were used to predict drug resistance. The different proposed models were fitted distinctly for each drug, with the exploration of the 10 most influential feature classes that were used as input features during training to obtain satisfactory performance. The predictive performance was measured using sensitivity, specificity, the f1 score, the receiver operating characteristic (ROC) curve and the area under the curve (AUC). Results All three tools reliably predicted drug resistance. They were able to outperform the AUCs of at least 6 drugs compared to algorithms from other studies, especially for 6 drugs, namely, ethambutol (EMB), kanamycin (KAN), capreomycin (CAP), amikacin (AMK), streptomycin (STR), and ethionamide (ETH). Overall, the best performing model was the deep learning model, which outperformed all existing direct association-based approaches as well as the previously reported machine learning models, with AUCs ranging from 0.97 to 0.99 for 9 drugs. Conclusion This work demonstrated the power of machine learning as a flexible approach for drug resistance prediction. This tool is able to consider a significant number of predictors and summarize their predictive ability, facilitating clinical decision making and detection of single-nucleotide polymorphisms in the era of increasing WGS data generation.

Article activity feed