Optimising High-throughput sequencing data analysis, from gene database selection to the analysis of compositional data: A case study on tropical soil nematodes

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

High-throughput sequencing (HTS) provides an efficient and cost-effective way to generate large amounts of sequence data. However, marker-based methods and the resulting datasets come with a range of challenges and disputes, including incomplete reference databases, controversial sequence similarity thresholds for delineating taxa, and downstream compositional data analysis. Here, we use HTS data from a soil nematode biodiversity experiment to address the following questions: (1) how the choice of reference database affects HTS data analysis, (2) whether the same ecological patterns are detected with ASV (100% similarity) versus classical OTU (97% similarity), and (3) how different data normalization methods affect the recovery of beta diversity patterns and identification of differentially abundant taxa. At this time, the SILVA database performed better than PR2, assigning more reads to family level and providing higher phylogenetic resolution. ASV- and OTU-based alpha and beta diversity of nematodes correlated closely, indicating that OTU-based studies represent useful reference points. For downstream data analyses, our results indicate that rarefaction-based methods are more vulnerable to missed findings, while clr-transformation based methods may overestimate tested effects. ANCOM-BC retains all data and accounts for uneven sampling fractions for each sample, suggesting that this is currently the optimal method to analyze compositional data. Overall, our study highlights the importance of comparing and selecting taxonomic reference databases before data analyses, and provides solid evidence for the similarity and comparability between OTU- and ASV-based nematode studies. Further, the results highlight the potential weakness of rarefaction-based and clr-transformation based methods. We recommend future studies use ASV and that both the taxonomic reference databases and normalization strategies are carefully tested and selected before analyzing the data.

Article activity feed