Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, platform-specific challenges, including high base-call error rate, non-uniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical tools. Here, we present Meta-NanoSim, a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads. Further, Meta-NanoSim improves upon state-of-the-art methods on microbial abundance estimation through a base-level quantification algorithm. We demonstrate that Meta-NanoSim simulated data can facilitate the development of metagenomic algorithms and guide experimental design through a metagenomic assembly benchmarking task.

Article activity feed

  1. Nanopore seq

    Reviewer2-Hadrien GourlÃ

    Thank you for a great piece of software and great article.A few minor points:l166-168: the clause about structural variant is unclear to me, and perhaps to the reader. Please consider rephrasing.l209-210: I understand that the number of mapped reads is unadequate for abundance estimation of ONT data, but k-mers should not suffer from the same problem, shouldn't they? the number of k-mers matching to a genomic region (or a genome) will scale appropriately with read length. I have therefore a hard time understanding why k-mers are presented as problematic in the first sentence of the Abundance Estimation paragraph.l379: What do you mean by "pronounced". Please consider rephrasing.l438: geomes -> genomesfigure 2: panel 2 should have the same theme as the other subplots of the figuresoftware comments:- I'd like to be able the examples present in the documentation out of the box: please add a link and instructions on how to download and unpack the zymo community- Speaking of the zymo community, why do Campylobacter and S. cerevisiae have a different path than the other genomes in the examples?- If you plan to not update the pre-trained error models to a more recent version of scipy, please pin scipy 0.22.1 to the bioconda recipe, so that users can use pre-trained models out-of-the-box- Please make a new release of the software including pull request !67- In a future version of Nanosim, I urge you to consider gathering all scripts into subcommands (i.e. nanosim simulate [-- params] instead of simulate.py [--params]. I realise this a big breaking change but it is good practise, and avoids polluting a user's PATH with many scripts. This change is in my opinion not required for the paper to be published, but something I'd like you to consider for a future release

  2. ABSTRACT

    Reviewer1-Andre Rodrigues-Soares

    This manuscript is of very good quality - as such, my review is quite limited. I would like to congratulate the authors for the development of the software and its comprehensive and extensive benchmarking.On l. 64 - Given the range of samples that can currently be sequenced using Nanopore sequencing and the recent focus on short reads as opposed to the previous highlight given to long reads, this statement is out-of-date. I would recommend describing instead the current range of sizes that can be sequenced via Nanopore.After reading the manuscript, I am however, left to wonder how the authors would approach the different error rates namely of different types of flowcell (R9 vs R10). It can be assumed that error rates of R9 sequences might indeed average 10% as stated in the manuscript, but with the advent of R10 flowcells (more recently R10.4.1) and respective updated chemistries, the error rates have decreased significantly. One specific error rate of 10% as stated in the manuscript can't be assumed for the sequencing technology as a whole anymore. While I don't think this should be central to the development of the tool, I think this should be addressed in the manuscript in some way.I would also have liked to see distributions of PHRED quality scores in the simulated reads in the analyses conducted in the manuscript. Although the assembly and genome recovery statistics namely in Figure 4 indicate these should have the expected distributions, I would have liked to understand how quality scores are distributed in the generated reads. If the two issues above are addressed in the manuscript, I will be happy to recommend its publication.I have no further reviews to add as the manuscript covers all other factors I would think could be worrying regarding a tool simulating Nanopore metagenomic reads.