Metagenomics analysis for microbial ecology investigation on historical samples: negligible effect of host DNA and optimal analysis strategies

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Microbiome composition and function are strongly influenced by its environmental factors, with major shifts driven by intensified anthropogenic pressures over the past centuries. This timeframe extends beyond the scope of traditional experimental or longitudinal studies commonly used to investigate microbiome dynamics. Despite their vast potential, historical samples available in museums and herbaria worldwide remains underutilized for studying host-microbiome interactions across broad temporal and spatial scales. It is due to incompatibilities with standard analytical pipelines and limited understanding of optimal classification parameters. While host DNA removal has traditionally been considered essential for accurate taxonomic assignment of metagenomic reads, this step is impractical for many historical samples because host reference genomes are unavailable for their species. Here, we show that host DNA content does not significantly affect microbial ecological analyses based on contemporary and historical samples. Additionally, DNA molecules from historical samples are highly fragmented and uneven in length. Conventional analysis workflows may be inefficient in this situation. To address this, we carried out detailed analyses on the impact of k-mer size on the accuracy of metagenomic assignments in historical samples. We propose a simple two-step approach in which reads are classified using two annotation databases constructed with k=24 and k=31. Through a simulation study, we demonstrated that this approach outperforms conventional workflows in effectively recovering microbial signals from a wide range of read lengths, including fragments as short as 24 bp (21 bp if k=21 is used). Together, this study provides a solid foundation for incorporating natural history collections into host-associated microbiome research, offering valuable insights into the long-term effects of anthropogenic change on microbial communities.

Article activity feed