Extensive modulation of a conserved cis -regulatory code across 589 grass species

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

The growing availability of genomes from non-model organisms offers new opportunities to identify functional loci underlying trait variation through comparative genomics. While cis -regulatory regions drive much of phenotypic evolution, linking them to specific functions remains challenging. We identified 514 cis -regulatory motifs enriched in regulatory regions of five diverse grass species, with 73% consistently enriched across all, suggesting a deeply conserved regulatory code. We then quantified conservation of specific motif instances across 589 grass species, revealing widespread gain and loss over evolutionary time. Conservation declined rapidly over the first few million years of divergence, yet ∼50% of motif instances were conserved back to the origin of grasses ∼100 million years ago. Conservation patterns varied by gene class, with modestly higher conservation at transcription factor genes. To test for adaptive cis -regulatory changes, we used phylogenetic mixed models to identify motif gains and losses associated with ecological niche transitions. Our models revealed polygenic adaptation across 810 motif-orthogroup combinations, including convergent gains of HSF/GARP motifs at an Alpha-N-acetylglucosaminidase gene associated with adaptation to temperate environments. Our results support a “stable code, variable sites” model in which cis -regulatory evolution involves extensive turnover of individual binding site instances while largely preserving transcription factors’ binding preferences. Cis -regulatory changes at hundreds to thousands of genes appear to contribute to environmental adaptation. Our results highlight the potential of comparative genomics and phylogenetic mixed models to reveal the genetic basis of complex traits.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/16300462.

    Cis-regulatory elements (CREs) possess key regulatory components that fine-tune gene expression patterns. The manuscript by Hale et al. focuses on addressing how CREs evolve during adaptive divergence in plants. While many models have been proposed to explain CRE evolution, the authors focus on two contrasting hypotheses: one proposes that adaptive divergence involves shifts in transcription factor (TF) binding preference (trans changes), while the other suggests that TF preferences remain conserved, but their binding site (TFBS) distribution evolves (binding site turnover). The authors refer to the latter as the 'stable code, variable sites' hypothesis. Grasses serve as a suitable model for testing this hypothesis due to the conservation of gene content and collinearity despite significant alterations in ploidy, genome size, and regulatory patterns. The authors effectively describe the extent of cis-regulatory conservation in unmethylated regions across five model grasses and develop novel computational methods to explore how turnover in TF motifs might be related to ecological niche transitions across 589 grass species. While this study represents a major advance in our understanding of cis-regulatory evolution, we have a few concerns/questions that may help improve the interpretation of the findings. It is also worth noting that the authors have also brought up some of the concerns we raised in their discussion.

    1. The authors used OrthoFinder to construct orthogroups for 32 representative genomes. OrthoFinder is not synteny aware and could potentially miss true orthologs, which could otherwise be detected using synteny-based approaches. Genomic rearrangements, including duplication and deletion, can occur among different genomes and challenge the effectiveness of this approach. Using genomic neighborhoods to infer orthologs reveals high-confidence 1:1 orthologs (Conover, Sharbrough, and Wendel 2021; Ludwig and Mrázek 2024). Additionally, the authors use BLAST to retrieve homologous sequences for predicted ancestral proteins from hundreds of short-read genome assemblies. This approach relies on the quality of ancestral sequence reconstruction for each gene family, which may be confounded by fast-evolving protein sequences, which is often the case for families involved in processes such as defense and reproduction. In light of this, might the "low-conservation genes" (Fig S10) be an artifact of orthology assignment? Perhaps orthogonal homology assignment methods might resolve this.

    2. In theory, cis-regulatory changes can only accompany phenotypic changes when they drive changes in gene expression. The authors provide a compelling argument that selection might constrain motif turnover (Fig 3A) and that variable motif abundance is associated with ecological niche transitions (Fig 4,5), but do these constraints have an impact on gene expression? The mere presence or absence of proximal upstream TFBSs may not fully explain the expression pattern of the associated gene(s). Multi-modal (RNAseq and ATACseq) data would be needed to confirm these associations. For example, evaluating whether HSF-GARP motif abundance influences steady-state or cold-inducible abundance of OG0018131 transcripts (Fig 5) in a subset of grasses would provide functional support for the author's claims.

    3. While the authors argue for a "stable code, variable sites" model of expression evolution, the conclusion appears to rest heavily on the conservation of transcription factor binding sites of a limited set of representative grasses (Fig 1). The study then focuses exclusively on the most conserved 377 motifs across the selected five genomes. We wonder if this may have biased the conclusion regarding stable code and variable sites. To address this, the study could include a global assessment (across 589 grass species) of locally least conserved motifs (the other 137 motifs not conserved across the selected five genomes). However, this raises another challenge: identifying biologically relevant motifs in non-model organisms. Multi-modal tests (ATACseq and RNAseq) are not scalable to such a large extent and would require higher-quality genome sequences. Additionally, even though the authors mention this, they do not present evidence that would rule out a significant role for trans-regulatory evolution, raising the possibility that their conclusion might be biased by cis-regulatory evolution and not a comprehensive test of the different regulatory mechanisms.

    4. The comparative study focused exclusively on 500 bp upstream of genes' translation start sites, which simplifies the assumptions of gene regulation to gain a general understanding of evolutionary context. However, CREs can be located much higher and even downstream. There could additionally be lineage-specific differences where motifs are conserved, but their location upstream or downstream becomes more relevant in gene regulation. Availability of better sequencing depth of the genomes for non-model organisms could help address this limitation. Moreover, the authors use 'higher proportion of TEs beyond 500 bp upstream' as the reason for considering only 500 bp upstream for their analysis. Since TEs can be a source of CRE evolution, particularly in environmental adaptation, we believe that some of the de novo CREs involved in niche transitions may have been overlooked.

    Overall, the study represents a unique and scalable way of exploring the conservation of CREs using advanced computational modelling and large genomic datasets. While the conclusions reported herein rely heavily on the quality of genome assemblies and the rigor of analytical pipelines, we believe this study represents a conceptual advance in our understanding of expression evolution. As the feasibility of generating orthogonal transcriptome and chromatin accessibility datasets increases, these findings can be functionally validated.

    Acknowledgement

    This preprint was discussed in the BPSC 240 course offered by Sunil Kenchanmane Raju in the Department of Botany and Plant Sciences at the University of California, Riverside, in Spring 2025. The authors thank the participants of the course for the detailed discussions, particularly Angel Morris, Skyler Wong, Wesley George, and Simoné Murguia.

    References

    Conover, Justin L., Joel Sharbrough, and Jonathan F. Wendel. 2021. "PSONIC: Ploidy-Aware Syntenic Orthologous Networks Identified via Collinearity." G3 (Bethesda, Md.) 11 (8). https://doi.org/10.1093/g3journal/jkab170.

    Ludwig, J., and J. Mrázek. 2024. "OrthoRefine: Automated Enhancement of Prior Ortholog Identification via Synteny." BMC Bioinformatics 25 (1): 163.

    Competing interests

    The authors declare that they have no competing interests.