MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

With the advent of the age of big data in bioinformatics, large volumes of data and high-performance computing power enable researchers to perform re-analyses of publicly available datasets at an unprecedented scale. Ever more studies imply the microbiome in both normal human physiology and a wide range of diseases. RNA sequencing technology (RNA-seq) is commonly used to infer global eukaryotic gene expression patterns under defined conditions, including human disease-related contexts; however, its generic nature also enables the detection of microbial and viral transcripts.

Findings

We developed a bioinformatic pipeline to screen existing human RNA-seq datasets for the presence of microbial and viral reads by re-inspecting the non-human-mapping read fraction. We validated this approach by recapitulating outcomes from six independent, controlled infection experiments of cell line models and compared them with an alternative metatranscriptomic mapping strategy. We then applied the pipeline to close to 150 terabytes of publicly available raw RNA-seq data from more than 17,000 samples from more than 400 studies relevant to human disease using state-of-the-art high-performance computing systems. The resulting data from this large-scale re-analysis are made available in the presented MetaMap resource.

Conclusions

Our results demonstrate that common human RNA-seq data, including those archived in public repositories, might contain valuable information to correlate microbial and viral detection patterns with diverse diseases. The presented MetaMap database thus provides a rich resource for hypothesis generation toward the role of the microbiome in human disease. Additionally, codes to process new datasets and perform statistical analyses are made available.

Article activity feed

  1. Now published in GigaScience doi: 10.1093/gigascience/giy070

    LM Simon 1Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for LM SimonAJ Westermann 2Institute for Molecular Infection Biology, University Würzburg, Würzburg, Germany3Helmholtz Institute for RNA-Based Infection Research (HIRI), Würzburg, GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteM Engel 1Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany4Helmholtz Zentrum München, German Research Center for Environmental Health, Scientific Computing Research Unit, Neuherberg, GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteAHA Elbehery 5Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Virology, Neuherberg, GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteB Hense 1Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteM Heinig 1Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for M HeinigL Deng 5Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Virology, Neuherberg, GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteFJ Theis 1Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany6Department of Mathematics, Technische Universität München, Munich, GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for FJ Theis

    A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giy070 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

    These peer reviews were as follows:

    Reviewer 1: http://dx.doi.org/10.5524/REVIEW.101204 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.101205