A Novel Efficient Algorithm for Common Variants Genotyping from Low-Coverage Sequencing Data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Low-coverage whole-genome sequencing (LC-WGS) combined with imputation represents a cost-effective genotyping strategy for genome-wide association studies (GWAS) in population genetics. In this study, the Limpute algorithm was developed specifically for genotyping from low-coverage sequencing data, it extracts variant information from low-coverage sequencing data by the novel virtual probes and subsequently performs imputation through cross-reference between samples. Compared to the currently dominant algorithm for low-coverage sequencing data, GLIMPSE2, Limpute achieved similar imputation performance within common variants (r 2 >0.87) while the GLIMPSE2 has a runtime approximately five times longer than that of the Limpute. Furthermore, to fully evaluate the accuracy of genotype imputation by Limpute, we utilized high-coverage whole-genome sequencing data (30x), microarray data, and high-coverage whole-exome sequencing data (30x) as validation sets respectively. The results demonstrated that Limpute has a good imputation performance for common variants using low-coverage sequencing data (1x: r 2 > 0.87; 3x: r 2 > 0.92; 5x: r 2 > 0.93). In summary, we present a highly efficient, low-cost algorithm for genotyping from low-coverage sequencing data, offering substantial support for genetic research.