Genotype-level quality control substantially reduces error rates in population-scale whole-genome sequencing

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Population-scale whole-genome sequencing data will contain many individual-level genotype errors, even after allele-level quality control (QC). We establish the need for genotype-level QC using UK Biobank (N=490,726) and All of Us v8 (N=414,830), where we remove up to 100 million (∼9%) additional low-quality variants. We demonstrate reduced false positive rate in downstream genetic association studies, highlight the power of parent-offspring trios for QC, and illustrate the need for sex-specific X-chromosome filtering. We provide a QCed All of Us v8 dataset in plink -pgen format, and an efficient pipeline for QC and conversion from VCF to plink- pgen for UK Biobank.

Article activity feed