Genomic properties representing plant sex chromosome evolution interpreted with genome language models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Plants have repeatedly evolved chromosomal sex-determining systems from hermaphroditic ancestors, providing a powerful natural framework for studying convergent evolution. However, sex chromosomes undergo extensive structural divergence, degeneration, and repeat accumulation, making direct comparisons across distant lineages difficult. Here, we apply a genome language model (gLM), which encodes genomic sequences into high-dimensional representations of their contextual properties, to independently evolved sex chromosomes from the distantly related genera Silene and Humulus . Without relying on sequence alignment or gene orthology, we identify convergent genomic signatures shared among plant Y chromosomes, including elevated GC content and depletion of specific trinucleotide motifs. Directionality analyses of latent genomic vectors further reveal common evolutionary trajectories associated with recombination suppression and Y chromosome differentiation. These properties differ from those observed in animal sex chromosomes, suggesting lineage-specific modes of sex chromosome evolution in plants. Our results demonstrate that genome language models can transform structurally incomparable chromosomes into quantitatively comparable evolutionary entities, allowing the interpretation of common genomic principles underlying convergent evolution across deeply diverged plant lineages.

Article activity feed