Non-canonical DNA in bird telomere-to-telomere genomes

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Non-canonical (non-B) DNA motifs are genomic sequences capable of folding into three-dimensional structures distinct from the canonical right-handed helix. These structures regulate gene expression but also serve as mutation hotspots and are linked to cancer. Because non-B DNA is difficult to sequence, its annotations have been incomplete in most genome assemblies. Telomere-to-telomere (T2T) assemblies now overcome this limitation. Here, we provide a comprehensive analysis of eight types of non-B DNA motifs (e.g., G-quadruplexes and Z-DNA) in the zebra finch T2T genome. Motif content varied strongly by chromosome categories; gene-rich dot chromosomes showed the highest motif levels (22.8-40.5%), microchromosomes intermediate levels (9.8-24.8%), and macrochromosomes the lowest (9.1-10.1%). Within chromosomes, Z-DNA was enriched at centromeres, and G-quadruplexes were enriched at promoters and 5'UTRs. Low methylation at G-quadruplexes suggests they can form and contribute to gene regulation in these regions. Comparable patterns of non-B DNA distribution were observed in the near T2T chicken genome, except that A-phased repeats and not Z-DNA were enriched at chicken centromeres. Overall, our findings indicate that the non-B DNA distribution reflects the distinctive architecture of avian genomes, implicating non-canonical DNA in gene expression and centromere organization. The unusually high density on dot chromosomes is negatively correlated with PacBio sequencing depth, and thus helps explain why these chromosomes have posed exceptional challenges for sequencing.

Article activity feed