Minimizing detection bias of somatic mutations in a highly heterozygous oak genome
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Somatic mutations are particularly relevant for long-lived organisms. Sources of somatic mutations include imperfect DNA repair, replication errors, and exogenous damage such as ultraviolet radiation. A previous study estimated a surprisingly low number of somatic mutations in a 234-year-old individual of the pedunculate oak ( Quercus robur ), known as the Napoleon Oak. It has been suggested that the true number of somatic mutations was underestimated due to gaps in the reference genome and too conservative filtering of potential mutations. We therefore generated new high-fidelity long-read data for the Napoleon Oak ( n = 12) to produce both a pseudo-haploid genome assembly and a partially phased diploid assembly. The high heterozygosity allowed for complete reconstruction of phased and gapless centromeres for 22 of the 24 chromosomes. On the other hand, the high heterozygosity posed challenges for short-read alignments. Use of only the pseudo-haploid assembly as a reference led to potential misalignments, while use of only the diploid assembly reduced variant detection sensitivity. Since most somatic mutations are layer-specific, the observed frequency is expected relatively low, even where all cells in a single layer contain a specific mutation. To address this challenge, we employed a read assignment strategy, selecting the appropriate reference sequence (pseudo-haploid or diploid) based on alignment score and mapping quality. Ultimately, we identified 198 high-confidence somatic mutations, compared with 17 somatic mutations identified before with the same set of short reads. Our approach thus increased the total estimated annual mutation rate by a factor of five.