Amira: detection of AMR genes directly from long reads using gene-space de Bruijn graphs

Daniel Anderson
Leandro Lima
Trieu Le
Louise Judd
Ryan Wick
Zamin Iqbal

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Accurate detection of antimicrobial resistance (AMR) genes is essential for the surveillance, epidemiology and genotypic prediction of AMR. This is typically done by generating an assembly from the sequencing reads of a bacterial isolate and running AMR gene detection tools on the assembly. However, despite advances in long-read sequencing that have greatly improved the quality and completeness of bacterial genome assemblies, assembly tools remain prone to large-scale errors caused by repeats in the genome, leading to inaccurate detection of AMR gene content and consequent impact on resistance prediction. In this work we present Amira, a tool to detect AMR genes directly from unassembled long-read sequencing data. Amira leverages the fact that multiple consecutive genes lie within a single read to construct gene-space de Bruijn graphs where the k -mer alphabet is the set of genes in the pan-genome of the species under study. Through this approach, the reads corresponding to different copies of AMR genes can be effectively separated based on the genomic context of the AMR genes, and used to infer the nucleotide sequence of each copy. Amira achieves significant improvements in genomic copy number recall and nucleotide accuracy, demonstrated through objective simulations and comparison with alternative read and assembly-based methods on samples with manually curated truth assemblies. Applied to a dataset of 32 Escherichia coli samples with diverse AMR gene content, Amira achieves a mean genomic-copy-number recall of 98.4% with precision 97.9% and nucleotide accuracy 99.9%. Finally, we show that Amira consistently detects more true AMR genes across all E. coli , K. pneumoniae and E. faecium nanopore datasets from the ENA (n=8580, 2448 and 415 respectively) than an assembly-based approach.

Version published to 10.1101/2025.05.16.654303 on bioRxiv
May 18, 2025

16S rRNA Variable Region Coverage in Salmonella enterica: Insights for Molecular Surveillance and Diagnostic Accuracy

This article has 4 authors:
1. Anubha Kumari
2. Md Misbaul Rashid
3. Priyambada Kumari
4. Abhishek Kumar Jaiswal
This article has no evaluationsLatest version Jan 22, 2026
Whole-Genome Sequencing of Multidrug-Resistant Gram-Negative Bacteria Isolated from Clinical Samples in Liberia Using Oxford Nanopore Technology

This article has 12 authors:
1. Francis Omega Somah
2. Fahn M. Taweh
3. Sianne Tokpa
4. Julius S.M Gilayeneh
5. Dormu Kollie
6. Helena Tarwoe
7. Mitchell Sarmie
8. Esther Tiawroh
9. Rebecca J. Koon
10. Austin Wuo
11. Randall Yeaney
12. Carmila Johnson
This article has no evaluationsLatest version Jan 14, 2026
Genome-Wide Discovery and Characterization of Putative Antimicrobial Resistance-Associated Small Open Reading Frames (sORFs) in the Staphylococcus aureus Pan-Genome

This article has 4 authors:
1. Saad Khan
2. Mehede Hassan Rubel
3. Mahmudul Hasan
4. Juan Philippe Teixeira
This article has no evaluationsLatest version Dec 19, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

16S rRNA Variable Region Coverage in Salmonella enterica: Insights for Molecular Surveillance and Diagnostic Accuracy

Whole-Genome Sequencing of Multidrug-Resistant Gram-Negative Bacteria Isolated from Clinical Samples in Liberia Using Oxford Nanopore Technology

Genome-Wide Discovery and Characterization of Putative Antimicrobial Resistance-Associated Small Open Reading Frames (sORFs) in the Staphylococcus aureus Pan-Genome