BugSplit enables genome-resolved metagenomics through highly accurate taxonomic binning of metagenomic assemblies
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
A large gap remains between sequencing a microbial community and characterizing all of the organisms inside of it. Here we develop a novel method to taxonomically bin metagenomic assemblies through alignment of contigs against a reference database. We show that this workflow, BugSplit, bins metagenome-assembled contigs to species with a 33% absolute improvement in F1-score when compared to alternative tools. We perform nanopore mNGS on patients with COVID-19, and using a reference database predating COVID-19, demonstrate that BugSplit’s taxonomic binning enables sensitive and specific detection of a novel coronavirus not possible with other approaches. When applied to nanopore mNGS data from cases of Klebsiella pneumoniae and Neisseria gonorrhoeae infection, BugSplit’s taxonomic binning accurately separates pathogen sequences from those of the host and microbiota, and unlocks the possibility of sequence typing, in silico serotyping, and antimicrobial resistance prediction of each organism within a sample. BugSplit is available at https://bugseq.com/academic .
Article activity feed
-
-
SciScore for 10.1101/2021.10.16.464647: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected. Sex as a biological variable not detected. Randomization As minimap2 randomly picks a primary alignment if there are multiple alignments with equal top score, we collapse equally good top hits to their lowest common ancestor. Blinding not detected. Power Analysis not detected. Table 2: Resources
Software and Algorithms Sentences Resources A mash database44, published by the mash authors and comprising all genomes and plasmid sequences in Refseq (https://gembox.cbcb.umd.edu/mash/refseq.genomes%2Bplasmid.k21s1000.msh) is used for homology search with Homopolish. Refseqsuggested: (RefSeq, RRID:SCR_003496)In brief, plasmid sequences are identified with PlasmidFinder58, and their taxonomic … SciScore for 10.1101/2021.10.16.464647: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected. Sex as a biological variable not detected. Randomization As minimap2 randomly picks a primary alignment if there are multiple alignments with equal top score, we collapse equally good top hits to their lowest common ancestor. Blinding not detected. Power Analysis not detected. Table 2: Resources
Software and Algorithms Sentences Resources A mash database44, published by the mash authors and comprising all genomes and plasmid sequences in Refseq (https://gembox.cbcb.umd.edu/mash/refseq.genomes%2Bplasmid.k21s1000.msh) is used for homology search with Homopolish. Refseqsuggested: (RefSeq, RRID:SCR_003496)In brief, plasmid sequences are identified with PlasmidFinder58, and their taxonomic identities are overridden to that of “plasmid sequences” (NCBI taxon 36549). PlasmidFinder58suggested: NoneMMseqs2 and DIAMOND were run with the NCBI non-redundant amino acid database as suggested by their authors. DIAMONDsuggested: (DIAMOND, RRID:SCR_009457)These files were generated by converting the NCBI taxonomy files (names.dmp and nodes.dmp) provided with the CAMI datasets into Newick format with the Python taxonomy package59. Pythonsuggested: (IPython, RRID:SCR_001658)Ground truths were generated by comparing each contig in our metagenomic assembly to the reference genome of each organism contained within the mock microbial community using MegaBLASTN. MegaBLASTNsuggested: NoneThe taxonomic identification of the top BLAST hit for each contig was determined to be its gold standard assignment. BLASTsuggested: (BLASTX, RRID:SCR_001653)Binning completion and contamination were assessed with CheckM using the default CheckM database. CheckMsuggested: (CheckM, RRID:SCR_016646)The NCBI nucleotide database from 2019 was downloaded from the second CAMI challenge (https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_2_DATABASES/ncbi_blast/nt.gz) and used in place of BugSplit’s default database for the emerging coronavirus application. BugSplit’ssuggested: NoneResults from OddPub: Thank you for sharing your data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:By incorporating graph topology and linkage of contigs, we will be able to mitigate this limitation and place the contig in multiple strain-level taxonomic bins. Further exploration of the parameter space of BugSplit may also result in improved binning. For example, minimap2 could be tuned for greater alignment recall while preserving precision than its default “map-ont” setting, and voting coverage thresholds may be able to be tuned for improved classification of contigs across the taxonomic hierarchy. Ultimately, we expect to adopt a strategy that will allow optimal values for key parameters to be determined by the taxonomic lineage of alignments. BugSplit is a highly accurate tool for taxonomic binning and profiling of third-generation metagenomic data with computing speeds faster than comparable workflows. We show that using BugSplit to bin metagenomic assemblies has several substantial downstream effects, including enabling highly similar species discrimination and identification, novel species identification and universal, pathogen-agnostic taxonomic profiling. When combined with automated assembly, polishing and post-processing of bins, we demonstrate that detecting pathogens, strain-typing them and accurately predicting their antimicrobial resistance directly from complex samples with mNGS becomes feasible.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- No funding statement was detected.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-