A transformer-based language model reveals developmental constraint and network complexity during zebrafish embryogenesis

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Understanding how regulatory complexity and constraint shape organismal development remains a central challenge in biology. The developmental hourglass framework posits that mid-embryogenesis –the phylotypic stage– is a period of heightened conservation and regulatory organization. We test this using Zebraformer, a transformer-based language model trained on single-cell transcriptomic data from zebrafish embryos. Unlike models focused on prediction or classification, Zebraformer learns context-sensitive representations of genes and cells that encode temporal progression, anatomical identity, and regulatory relationships. Embeddings reflect differentiation timing and transcriptional divergence, while attention-derived gene networks reveal a transient rise in complexity during the phylotypic stage. This stage also exhibits increased perturbation sensitivity and a shift toward centralized, modular network topology. These features are supported by graph-theoretic metrics and gene ontology enrichment, offering data-driven evidence for highly structured regulation during mid-embryogenesis. Our results demonstrate that language models can extract interpretable biological structure and support longstanding developmental theory from high-dimensional data.

Significance statement

Understanding how cells coordinate to build a complex organism remains a central challenge in biology. Development is not only genetically encoded but context-dependent; shaped by dynamic interactions among genes, cells, and time. Here, we use a transformer-based language model, Zebraformer, trained on single-cell gene expression data from zebrafish embryos, to investigate how regulatory structure evolves during development. The model captures key features of organismal formation: increasing transcriptional divergence, anatomical specificity, and a transient rise in regulatory complexity and perturbation sensitivity during the conserved phylotypic stage. These findings provide data-driven support for the developmental hourglass hypothesis and demonstrate that contextual models can uncover fundamental organizational principles from biological data alone.

Article activity feed