Telomere-to-telomere assemblies reveal complex adaptive variation of 3-ketoacyl-CoA-synthases in Populus trichocarpa likely driven by helitrons
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
The model woody plant Populus trichocarpa displays an atypical alkene-diverse wax cuticle likely driven by copy number variation (CNV) of 3-ketoacyl-CoA synthases ( KCS ), which has been difficult to confirm based on short-read assemblies. New long-read sequencing provides opportunities to develop telomere-to-telomere resources to detect cryptic variation, including CNVs, which are currently missed in traditional analyses. Integrating this information can improve genomic prediction for breeding and provide insights into the evolutionary basis of important traits.
Results
Our analysis of 78 telomere-to-telomere long-read haplotypes identified more than twice as many KCS genes as previously reported, along with numerous intragenic non-synonymous substitutions. Random forest predictive models highlighted the importance of Potri . 010G079500 in producing very long chain alkenes; however, its absence did not predict previously reported alkene-deficient phenotypes. Instead, alkene levels are best predicted by the combinations of KCS copies. Amino acid substitutions clustered around ligand and donor binding pockets, suggesting they contribute to differing wax cuticle composition. Finally, each KCS gene and copy was linked to a helitron transposon. A phylogenetic analysis indicates they are the evolutionary mechanism for generating KCS tandem arrays.
Conclusions
Long-read sequencing and telomere-to-telomere assembles revealed large-effect loci critical to genetic studies that are unattainable from short-reads. These approaches also have the potential to reveal novel insights into genome structure and function, such as the helitrons identified here. Our results highlight that, given current challenges in annotation and assembly, detailed and focused long-read sequences are key to interpreting complex genomic regions that contain tandem copy number variants.