The evaluation of different combinations of enzyme set, aligner and caller in GBS sequencing of soybean
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Genotype-by-sequencing (GBS) is a cost-effective method for large-scale genotyping, widely used across various species, particularly those with large genomes. A critical aspect of GBS lies in the selection of restriction enzymes for genome digestion and the optimization of data analysis pipelines. However, few studies have comprehensively examined the combined effects of enzyme choice and pipeline configuration. Results In this study, we created GBS libraries using three commonly used restriction enzyme combinations ( HindIII - NlaIII , PstI - MspI , and ApeKI ) and assessed multiple SNP-calling pipelines in 15 soybean varieties. We tested four aligners (BWA-MEM, Bowtie2, BBMap, and Strobealign) and seven SNP callers (Bcftools, Stacks, DeepVariant, FreeBayes, VarScan, BBCallVariants, and GATK). Our finding reveal that enzyme choice significantly influences the number of identified SNP, gene localization preferences, and accuracy. Furthermore, the performance of SNP callers varied markedly in terms of SNP count, precision, recall, and false discovery rate (FDR). DeepVariant exhibited the highest accuracy, with 76.0% of its SNPs intersecting with whole-genome sequencing (WGS)-derived SNPs and an FDR of 0.0095, compared to FreeBayes, which had 47.8% intersection and an FDR of 0.6321. Conclusions Our findings underscore the importance of optimizing both enzyme selection for sequencing libraries and data analysis pipelines to ensure robust and reproducible results. This study provides a general framework for designing large-scale genotyping experiments aimed to specific quality and quantity requirements in various plant species.