Long-read sequencing of trios reveals increased germline and postzygotic mutation rates in repetitive DNA
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Long-read sequencing (LRS) has improved sensitivity to discover variation in complex repetitive regions, assign parent–of–origin, and distinguish de novo germline from postzygotic mutations (PZMs). Most studies have been limited to population genetic surveys or a few families. We applied three orthogonal sequencing technologies—Illumina, Oxford Nanopore Technologies, and Pacific Biosciences—to discover and validate de novo mutations (DNMs) in 73 children from 42 autism families (157 individuals). Assaying 2.77 Gbp of the human genome using read-based approaches, we discover on average 95 DNMs per transmission (87.5 de novo single–nucleotide variants and 7.8 indels), including sex chromosomes. We estimate that LRS increases DNM discovery by 20–40% over previous Illumina-based studies of the same families, and more than doubles the discoverable number of PZMs that emerged early in embryonic development. The strict germline mutation rate is 1.30×10 −8 substitutions per base pair per generation, strongly driven by the father's germline (3.95:1), while PZMs increase the rate by 0.23×10 −8 with a modest but significant bias toward paternal haplotypes (1.15:1). We show that the mutation rate is significantly increased for classes of repetitive DNA, where segmental duplication (SD) mutation shows a dependence on the length and percent identity of the SD. We find that the mutation rate enrichment in repeats occurs predominantly postzygotically as opposed to in the germline, a likely result of faulty DNA repair and interlocus gene conversion.