Genotype-level quality control substantially reduces error rates in population-scale whole-genome sequencing
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Population-scale whole-genome sequencing data will contain many individual-level genotype errors, even after allele-level quality control (QC). We establish the need for genotype-level QC using UK Biobank (N=490,726) and All of Us v8 (N=414,830), where we remove up to 100 million (∼9%) additional low-quality variants. We demonstrate reduced false positive rate in downstream genetic association studies, highlight the power of parent-offspring trios for QC, and illustrate the need for sex-specific X-chromosome filtering. We provide a QCed All of Us v8 dataset in plink -pgen format, and an efficient pipeline for QC and conversion from VCF to plink- pgen for UK Biobank.