Struo2: efficient metagenome profiling database construction for ever-expanding microbial genome datasets

Nicholas D. Youngblut
Ruth E. Ley

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (PeerJ)

Abstract

Mapping metagenome reads to reference databases is the standard approach for assessing microbial taxonomic and functional diversity from metagenomic data. However, public reference databases often lack recently generated genomic data such as metagenome-assembled genomes (MAGs), which can limit the sensitivity of read-mapping approaches. We previously developed the Struo pipeline in order to provide a straight-forward method for constructing custom databases; however, the pipeline does not scale well enough to cope with the ever-increasing number of publicly available microbial genomes. Moreover, the pipeline does not allow for efficient database updating as new data are generated. To address these issues, we developed Struo2, which is >3.5 fold faster than Struo at database generation and can also efficiently update existing databases. We also provide custom Kraken2, Bracken, and HUMAnN3 databases that can be easily updated with new genomes and/or individual gene sequences. Efficient database updating, coupled with our pre-generated databases, enables “assembly-enhanced” profiling, which increases database comprehensiveness via inclusion of native genomic content. Inclusion of newly generated genomic content can greatly increase database comprehensiveness, especially for understudied biomes, which will enable more accurate assessments of microbiome diversity.

PeerJ
Sep 16, 2021

Read the original source
PeerJ
Sep 16, 2021

Read the original source
Version published to 10.7717/peerj.12198
Sep 16, 2021
Version published to 10.1101/2021.02.10.430604v1 on bioRxiv
Feb 10, 2021

Metagenomics-Toolkit: The Flexible and Efficient Cloud-Based Metagenomics Workflow featuring Machine Learning-Enabled Resource Allocation

This article has 6 authors:
1. Peter Belmann
2. Benedikt Osterholz
3. Nils Kleinboelting
4. Alfred Puehler
5. Andreas Schlueter
6. Alexander Sczyrba
This article has no evaluationsLatest version Nov 12, 2024
GBRAP: A Comprehensive Database and Tool for Exploring Genomic Diversity Across All Domains of Life

This article has 5 authors:
1. Sachithra Kalhari Yaddehige
2. Chiara Vischioni
3. Leonardo Alberghini
4. Michele Berselli
5. Cristian Taccioli
This article has no evaluationsLatest version Oct 31, 2024
De novo clustering of extensive long-read transcriptome datasets with isONclust3

This article has 2 authors:
1. Alexander J. Petri
2. Kristoffer Sahlin
This article has no evaluationsLatest version Nov 3, 2024

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Related articles

Metagenomics-Toolkit: The Flexible and Efficient Cloud-Based Metagenomics Workflow featuring Machine Learning-Enabled Resource Allocation

GBRAP: A Comprehensive Database and Tool for Exploring Genomic Diversity Across All Domains of Life

De novo clustering of extensive long-read transcriptome datasets with isONclust3