Nanopore long-read only genome assembly of clinical Enterobacterales isolates is complete and accurate

Dorottya Nagy
Valentina Pennetta
Gillian Rodger
Katie Hopkins
Christopher R. Jones
The NEKSUS Consortium
Susan Hopkins
Derrick Crook
A. Sarah Walker
Julie Robotham
Katie L. Hopkins
Alice Ledda
David Williams
Russell Hope
Colin S. Brown
Nicole Stoesser
Samuel Lipworth

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Whole bacterial genome sequence reconstruction using Oxford Nanopore Technologies (“Nanopore”) long-read only sequencing may offer a lower-cost, higher-throughput alternative for pathogen surveillance to ‘hybrid’ assembly with recent improvements in Nanopore sequencing accuracy. We evaluated the accuracy, including plasmid reconstruction, of Nanopore long-read only genome assemblies of Enterobacterales.

We sequenced 92 genomes from clinical Enterobacterales isolates, collected in England under a national surveillance program, with long-read Nanopore (R10.4.1, Dorado v5.0.0 super-high-accuracy basecalled) and short-read Illumina (NovaSeq) sequencing approaches. Genomes were assembled using three long-read only (Flye; Hybracter long; Autocycler), and three hybrid assemblers (Hybracter hybrid; Unicycler normal; bold). Three polishing modalities (Medaka v2 with subsampled or un-subsampled long-reads; Polypolish + Pypolca with short-reads) were investigated.

Autocycler circularised the most chromosomes (87/92 [95%]). Plasmid sequence reconstruction was comparable between all assemblers except Flye, all recovering 90-96% of plasmids, although the ‘ground truth’ was uncertain. Flye performed worse than other assemblers on almost all metrics. Autocycler + Medaka (un-subsampled long-reads) was the most accurate long-read only assembler/polisher combination, comparable to hybrid assemblies (median 0 [IQR:0-0] SNPs and 0 [IQR:0-1] indels per genome; quality value/Q score, 100 [IQR: 64-100]), with only 4/92 genome sequences having >10 SNPs/indels. Medaka polishing with un-subsampled long-reads resulted in small improvements in indels but not SNPs for both Flye and Autocycler assemblies. Seven-locus MLST, antimicrobial resistance, virulence, and stress gene annotation was equivalent across assembler/polisher combinations.

Nanopore long-read only bacterial genome assembly with Autocycler combined with Medaka polishing (using un-subsampled reads) is similarly accurate and possibly more complete than hybrid assemblies, representing a viable alternative for incorporating high-quality genomic data, including plasmids, into Enterobacterales surveillance.

Data Summary

Nanopore long-reads and Illumina short-reads from the 92 Enterobacterales isolates from this study have been uploaded to ENA (BioProject accession: PRJEB93885). Code for the Nextflow assembly pipeline, downstream analysis scripts, and R statistical analysis scripts are available on GitHub ( https://github.com/oxfordmmm/NEKSUS_ont_hybrid_assembly_comparison ). The following supplementary data tables are available on FigShare ( https://figshare.com/account/home#/projects/253775 ):

ENA Sample accessions and sample metadata (accessions_and_metadata.csv)
Seqkit stats summaries of the Illumina and Nanopore reads (raw_qc_sup.cav)
Summary of assembly contig features (contigs_summary_sup_cleaned.csv)
Pairwise mash distances between contigs (mash_cleaned.csv)
Plasmids matching across different assemblers compared to the Hybracter (hybrid) and manually-curated reference sets (plasmids_match_hybracter_mash.csv; plasmids_match_manual_mash.csv, respectively)
Seven-locus multi-locus sequence type annotation (mlst_cleaned.csv)
CheckM2 summaries of assemblies (checkm2_cleaned.csv)
Nucleotide-level accuracy of assemblies (SNP, Indels, and Quality value compared to short-read mapping; assembly_nucleotide_accuracy_cleaned.csv)
Bakta annotation (bakta_by_contig_cleaned.csv)
AMRFinderPlus annotations of contigs (amrfinder_plus_cleaned.csv)
MOB-suite annotation summaries of contigs (mobsuite_cleaned.csv)

Impact Statement

Nanopore long-reads have historically been too error-prone to use alone for accurate bacterial genome assembly, necessitating additional Illumina short-reads to achieve structurally complete and accurate ‘hybrid’ genome assemblies for public health surveillance. This increases cost and complexity. Previous studies have shown that recent improvements in Nanopore chemistry (R10.4.1 flowcell) and basecalling (super-high accuracy) allow high-quality long-read only assemblies on a small number of laboratory reference strains. This is the first evaluation, to our knowledge, to assess Nanopore long-read only genome assembly compared with hybrid assembly on a large number of clinical isolates. In addition, this is the first large-scale evaluation of the recently released automated consensus long-read assembly tool, Autocycler.

We show that Autocycler long-read only assemblies are more structurally complete for chromosomal sequences, while reconstructing a similar number of plasmids to other long-read and hybrid assemblers. Most long-read polished, Autocycler-assembled genome sequences have 0 errors (median: 0 SNPs/indels) relative to a short-read polished (hybrid) Autocycler assemblies, enabling accurate annotation of key genes.

Version published to 10.1101/2025.09.15.676237 on bioRxiv
Sep 17, 2025

Nanopore Data-Driven Near-T2T Genome Assembly of <em>Hippophae rhamnoides</em> ssp. <em>mongolica</em> Rousi

This article has 15 authors:
1. Alexander Arkhipov
2. Nadezhda Bolsheva
3. Elena Pushkova
4. Vladislav Babenko
5. Yury Zubarev
6. Vera Kovalenko
7. Fedor Kostromskoy
8. Elizaveta Ivankina
9. Ekaterina Dvorianinova
10. Nikolai Barsukov
11. Daiana Krupskaya
12. Elena Borkhert
13. Ksenia Klimina
14. Nataliya Melnikova
15. Alexey Dmitriev
This article has no evaluationsLatest version Dec 15, 2025
Whole-Genome Sequencing of Multidrug-Resistant Gram-Negative Bacteria Isolated from Clinical Samples in Liberia Using Oxford Nanopore Technology

This article has 12 authors:
1. Francis Omega Somah
2. Fahn M. Taweh
3. Sianne Tokpa
4. Julius S.M Gilayeneh
5. Dormu Kollie
6. Helena Tarwoe
7. Mitchell Sarmie
8. Esther Tiawroh
9. Rebecca J. Koon
10. Austin Wuo
11. Randall Yeaney
12. Carmila Johnson
This article has no evaluationsLatest version Jan 14, 2026
Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

This article has 7 authors:
1. Grazia Visci
2. Elisabetta Notario
3. Giuseppe Defazio
4. Mariano Francesco Caratozzolo
5. Bruno Fosso
6. Marinella Marzano
7. Graziano Pesole
This article has no evaluationsLatest version Jan 30, 2026

Discuss this preprint

Listed in

Abstract

Data Summary

Impact Statement

Article activity feed

Related articles

Nanopore Data-Driven Near-T2T Genome Assembly of <em>Hippophae rhamnoides</em> ssp. <em>mongolica</em> Rousi

Whole-Genome Sequencing of Multidrug-Resistant Gram-Negative Bacteria Isolated from Clinical Samples in Liberia Using Oxford Nanopore Technology

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world