ImputePGTA: accurate embryo genotyping and polygenic scoring from ultra-low-pass sequencing
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Preimplantation genetic testing for polygenic risk (PGT-P) holds great promise for reducing lifetime disease burden, but has been held back by the difficulty of genotyping embryos. Preimplantation genetic testing for aneuploidy (PGT-A) is a standard-of-care technology used in over half of in vitro fertilization (IVF) cycles in the United States. PGT-A is used to detect chromosomal abnormalities using ultra-low-pass (ULP) sequencing data (typically 0.002x to 0.006x) or, less commonly, genotyping array-based data. Here we describe ImputePGTA, a Hidden Markov Model-based algorithm that enables accurate reconstruction of embryo genomes from array or ULP sequencing data from embryos and parental genome data. A key innovation of our algorithm is its ability to provide accurate embryo genotypes and polygenic scores (PGSs) along with posterior distributions given limited embryo data and imperfectly phased parental haplotypes, as encountered in real-world applications. The accuracy of the embryo genome reconstruction increases with that of the phasing quality of parental haplotypes. We describe a method, phaseGrafter, that improves parental phasing by combining statistical phasing from short-reads with read-backed phasing from long-reads, which further enable phasing of rare pathogenic variants. We validate our results through simulations, downsampled gold standard data, and comparison of six reconstructed embryo genomes from real PGT-A data to high-coverage, post-birth whole genome sequencing data. Our imputed embryo genotypes have a dosage correlation of 0.961 with high-quality post-birth genotypes (0.998 when using embryo array data). The imputed embryo polygenic scores for 17 diseases have a mean absolute difference of 0.16 standard deviations (0.023 when using embryo array data) with PGSs calculated from high-quality post-birth genotypes, lower than from imputation of array data from reference panels. We show that the attenuation in expected gains from embryo selection due to posterior uncertainty is only ∼5-10% for typical PGT-A data. Our approach removes an important technological barrier to using PGT-P and will facilitate more widespread adoption.