Finding the known unknowns: minimal machine learning models of resistance identify novel antibiotic resistance discovery opportunities in Klebsiella pneumoniae
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Bacterial antimicrobial resistance (AMR) poses a significant public health threat. The advent of global awareness and affordable whole genome sequencing has yielded an ever-growing collection of bacterial genome sequence datasets and corresponding antibiotic resistance metadata. This enables the use of computational techniques, including machine learning (ML), to predict phenotypes and discover novel AMR-associated variants. With the great variety of resistance mechanisms to interrogate and the number of datasets that can be mined, there is a need to identify where novel AMR marker discovery is most necessary. Multiple databases and annotation pipelines exist to identify AMR variants known to be associated with resistance to specific antibiotics or antibiotic classes, however, the completeness of these databases varies and for some antibiotics even the most complete databases remain insufficient for accurate classification. Here, we couple these pipelines with predictive ML models, which we call “minimal models” of resistance. We predict the binary resistance phenotypes of 20 major antimicrobials in the genomically diverse pathogen Klebsiella pneumoniae . We present a detailed comparison of the annotation pipelines and drug resistance databases currently available, and we identify their shortcomings in phenotype prediction, highlighting opportunities for novel marker discovery. We further provide a description of a Bacterial and Viral Bioinformatics Resource Center (BV-BRC) database, highlighting the observed AMR mechanism as the key for phenotype prediction in this dataset. This analysis has relevance for all those seeking to use or improve drug resistance databases. It provides a critical review of the differences in annotation tools and databases commonly used in bacterial AMR studies, identifying existing gaps and novel AMR marker discovery niches. It outlines guidance for the establishment of a real standard dataset for the development and benchmarking of ML models of AMR.