Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim

Chen Yang
Theodora Lo
Ka Ming Nip
Saber Hafezqorani
René L Warren
Inanc Birol

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (GigaScience)

Abstract

Background

Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, nonuniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical algorithms. The use of simulated datasets with characteristics that are true to the sequencing platform under evaluation is a cost-effective way to assess the performance of bioinformatics tools with the ground truth in a controlled environment.

Results

Here, we present Meta-NanoSim, a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads. It improves upon state-of-the-art methods on microbial abundance estimation through a base-level quantification algorithm. Meta-NanoSim can simulate complex microbial communities composed of both linear and circular genomes and can stream reference genomes from online servers directly. Simulated datasets showed high congruence with experimental data in terms of read length, error profiles, and abundance levels. We demonstrate that Meta-NanoSim simulated data can facilitate the development of metagenomic algorithms and guide experimental design through a metagenome assembly benchmarking task.

Conclusions

The Meta-NanoSim characterization module investigates read features, including chimeric information and abundance levels, while the simulation module simulates large and complex multisample microbial communities with different abundance profiles. All trained models and the software are freely accessible at GitHub: https://github.com/bcgsc/NanoSim.

GigaScience
Apr 10, 2023

Nanopore seq

Reviewer2-Hadrien GourlÃ

Thank you for a great piece of software and great article.A few minor points:l166-168: the clause about structural variant is unclear to me, and perhaps to the reader. Please consider rephrasing.l209-210: I understand that the number of mapped reads is unadequate for abundance estimation of ONT data, but k-mers should not suffer from the same problem, shouldn't they? the number of k-mers matching to a genomic region (or a genome) will scale appropriately with read length. I have therefore a hard time understanding why k-mers are presented as problematic in the first sentence of the Abundance Estimation paragraph.l379: What do you mean by "pronounced". Please consider rephrasing.l438: geomes -> genomesfigure 2: panel 2 should have the same theme as the other subplots of the figuresoftware …

Nanopore seq

Reviewer2-Hadrien GourlÃ

Thank you for a great piece of software and great article.A few minor points:l166-168: the clause about structural variant is unclear to me, and perhaps to the reader. Please consider rephrasing.l209-210: I understand that the number of mapped reads is unadequate for abundance estimation of ONT data, but k-mers should not suffer from the same problem, shouldn't they? the number of k-mers matching to a genomic region (or a genome) will scale appropriately with read length. I have therefore a hard time understanding why k-mers are presented as problematic in the first sentence of the Abundance Estimation paragraph.l379: What do you mean by "pronounced". Please consider rephrasing.l438: geomes -> genomesfigure 2: panel 2 should have the same theme as the other subplots of the figuresoftware comments:- I'd like to be able the examples present in the documentation out of the box: please add a link and instructions on how to download and unpack the zymo community- Speaking of the zymo community, why do Campylobacter and S. cerevisiae have a different path than the other genomes in the examples?- If you plan to not update the pre-trained error models to a more recent version of scipy, please pin scipy 0.22.1 to the bioconda recipe, so that users can use pre-trained models out-of-the-box- Please make a new release of the software including pull request !67- In a future version of Nanosim, I urge you to consider gathering all scripts into subcommands (i.e. nanosim simulate [-- params] instead of simulate.py [--params]. I realise this a big breaking change but it is good practise, and avoids polluting a user's PATH with many scripts. This change is in my opinion not required for the paper to be published, but something I'd like you to consider for a future release

Read the original source
GigaScience
Apr 10, 2023

ABSTRACT

Reviewer1-Andre Rodrigues-Soares

This manuscript is of very good quality - as such, my review is quite limited. I would like to congratulate the authors for the development of the software and its comprehensive and extensive benchmarking.On l. 64 - Given the range of samples that can currently be sequenced using Nanopore sequencing and the recent focus on short reads as opposed to the previous highlight given to long reads, this statement is out-of-date. I would recommend describing instead the current range of sizes that can be sequenced via Nanopore.After reading the manuscript, I am however, left to wonder how the authors would approach the different error rates namely of different types of flowcell (R9 vs R10). It can be assumed that error rates of R9 sequences might indeed average 10% as stated in the manuscript, but …

ABSTRACT

Reviewer1-Andre Rodrigues-Soares

This manuscript is of very good quality - as such, my review is quite limited. I would like to congratulate the authors for the development of the software and its comprehensive and extensive benchmarking.On l. 64 - Given the range of samples that can currently be sequenced using Nanopore sequencing and the recent focus on short reads as opposed to the previous highlight given to long reads, this statement is out-of-date. I would recommend describing instead the current range of sizes that can be sequenced via Nanopore.After reading the manuscript, I am however, left to wonder how the authors would approach the different error rates namely of different types of flowcell (R9 vs R10). It can be assumed that error rates of R9 sequences might indeed average 10% as stated in the manuscript, but with the advent of R10 flowcells (more recently R10.4.1) and respective updated chemistries, the error rates have decreased significantly. One specific error rate of 10% as stated in the manuscript can't be assumed for the sequencing technology as a whole anymore. While I don't think this should be central to the development of the tool, I think this should be addressed in the manuscript in some way.I would also have liked to see distributions of PHRED quality scores in the simulated reads in the analyses conducted in the manuscript. Although the assembly and genome recovery statistics namely in Figure 4 indicate these should have the expected distributions, I would have liked to understand how quality scores are distributed in the generated reads. If the two issues above are addressed in the manuscript, I will be happy to recommend its publication.I have no further reviews to add as the manuscript covers all other factors I would think could be worrying regarding a tool simulating Nanopore metagenomic reads.

Read the original source
Version published to 10.1093/gigascience/giad013
Jan 1, 2023
Version published to 10.1101/2021.11.19.469328 on bioRxiv
Nov 20, 2021

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

This article has 7 authors:
1. Grazia Visci
2. Elisabetta Notario
3. Giuseppe Defazio
4. Mariano Francesco Caratozzolo
5. Bruno Fosso
6. Marinella Marzano
7. Graziano Pesole
This article has no evaluationsLatest version Jan 30, 2026
Sequenoscope: A Modular Tool for Nanopore Adaptive Sequencing Analytics and Beyond

This article has 9 authors:
1. Abdallah Meknas
2. Kyrylo Bessonov
3. Shannon H.C. Eagle
4. Christy-Lynn Peterson
5. James Robertson
6. Nicole Ricker
7. Tara Signorelli
8. John Nash
9. Aleisha Reimer
Reviewed by Access Microbiology

This article has 7 evaluationsLatest version Dec 18, 2025Latest activity Jan 25, 2026
Fishnet simplifies and accelerates signal-to-sequence alignment in Nanopore sequencing

This article has 8 authors:
1. Vincent Dietrich
2. Lioba Lehmann
3. Stefan Pastore
4. Stefan Mündnich
5. Lukas Walz
6. Mark Helm
7. Kristina Friedland
8. Susanne Gerber
This article has no evaluationsLatest version Jan 16, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Results

Conclusions

Article activity feed

Related articles

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

Sequenoscope: A Modular Tool for Nanopore Adaptive Sequencing Analytics and Beyond

Fishnet simplifies and accelerates signal-to-sequence alignment in Nanopore sequencing