Predicting rifampicin resistance in M. tuberculosis using machine learning informed by protein structural and chemical features

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Rifampicin remains a key antibiotic in the treatment of tuberculosis. Despite advances in cataloguing resistance-associated variants (RAVs), novel and rare mutations in the relevent gene, rpoB , will be encountered in clinical samples, complicating the task of using genetics to predict whether a sample is resistant or not to rifampicin. We have trained a series of machine learning models with the aim of complementing genetics-based drug susceptibility testing.

Methods

We built a Test+Train dataset comprising 219 susceptible mutations and 46 RAVs. Features derived from the structure of the RNA polymerase or the change in chemistry introduced by the mutation were considered, however, only a few, notably the distance from the rifampicin binding site, were found to be predictive on their own. Due to the paucity of RAVs we used Monte Carlo cross-validation with 50 repeats to train four different machine learning models.

Results

All four models behaved similarly with sensitivities and specificities in the range 0.84-0.88 and 0.94-0.97 although we preferred the ensemble of Decision Tree models as they are easy to inspect and understand. We showed that measuring distances from molecular dynamics simulations did not improve performance.

Conclusions

It is possible to predict whether a mutation in rpoB confers resistance to rifampicin using a machine learning model trained on a combination of structural, chemical and evolutionary features, however performance is moderate and training is complicated by the lack of data.

Article activity feed