Finding the known unknowns: minimal machine learning models of resistance identify novel antibiotic resistance discovery opportunities in Klebsiella pneumoniae

Kristina Kordova
Caitlin Collins
Julian Parkhill

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Rapid Reviews Infectious Diseases)

Abstract

Bacterial antimicrobial resistance (AMR) poses a significant public health threat. The advent of global awareness and affordable whole genome sequencing has yielded an ever-growing collection of bacterial genome sequence datasets and corresponding antibiotic resistance metadata. This enables the use of computational techniques, including machine learning (ML), to predict phenotypes and discover novel AMR-associated variants. With the great variety of resistance mechanisms to interrogate and the number of datasets that can be mined, there is a need to identify where novel AMR marker discovery is most necessary. Multiple databases and annotation pipelines exist to identify AMR variants known to be associated with resistance to specific antibiotics or antibiotic classes, however, the completeness of these databases varies and for some antibiotics even the most complete databases remain insufficient for accurate classification. Here, we couple these pipelines with predictive ML models, which we call “minimal models” of resistance. We predict the binary resistance phenotypes of 20 major antimicrobials in the genomically diverse pathogen Klebsiella pneumoniae . We present a detailed comparison of the annotation pipelines and drug resistance databases currently available, and we identify their shortcomings in phenotype prediction, highlighting opportunities for novel marker discovery. We further provide a description of a Bacterial and Viral Bioinformatics Resource Center (BV-BRC) database, highlighting the observed AMR mechanism as the key for phenotype prediction in this dataset. This analysis has relevance for all those seeking to use or improve drug resistance databases. It provides a critical review of the differences in annotation tools and databases commonly used in bacterial AMR studies, identifying existing gaps and novel AMR marker discovery niches. It outlines guidance for the establishment of a real standard dataset for the development and benchmarking of ML models of AMR.

Rapid Reviews Infectious Diseases
May 16, 2025

Georgios Feretzakis

Review 3: "Finding the Known Unknowns: Minimal Machine Learning Models of Resistance Identify Novel Antibiotic Resistance Discovery Opportunities in Klebsiella Pneumoniae"

Peer reviewers commend the study for its robust methodology, novel comparative analysis of AMR databases, and its relevance to improving genome-based resistance prediction.

Read the original source
Rapid Reviews Infectious Diseases
May 7, 2025

Samuel Shelburne

Review 2: "Finding the Known Unknowns: Minimal Machine Learning Models of Resistance Identify Novel Antibiotic Resistance Discovery Opportunities in Klebsiella Pneumoniae"

Peer reviewers commend the study for its robust methodology, novel comparative analysis of AMR databases, and its relevance to improving genome-based resistance prediction.

Read the original source
Rapid Reviews Infectious Diseases
May 7, 2025

Lara Urban

Review 1: "Finding the Known Unknowns: Minimal Machine Learning Models of Resistance Identify Novel Antibiotic Resistance Discovery Opportunities in Klebsiella Pneumoniae"

Peer reviewers commend the study for its robust methodology, novel comparative analysis of AMR databases, and its relevance to improving genome-based resistance prediction.

Read the original source
Rapid Reviews Infectious Diseases
May 7, 2025

Strength of evidence

Reviewer(s): L Urban (University of Zurich) | 📗📗📗📗◻️
S Shelburne (MD Anderson Cancer Center) | 📘📘📘📘📘
G Feretzakis (Hellenic Open University) | 📗📗📗📗◻️

Read the original source
Version published to 10.1101/2025.04.08.647753 on bioRxiv
Apr 8, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Georgios Feretzakis

Samuel Shelburne

Lara Urban

Strength of evidence