Deep-Plant: a supervised foundation model for plant regulatory genomics

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large-scale sequence-to-function deep learning models have demonstrated unparalleled ability to model biological sequences and have revolutionized the field of regulatory genomics. However, the majority of such efforts have centered on human and mammalian systems, leaving plant regulatory genomics comparatively underexplored. To address this gap, we introduce D eep -P lant , a supervised foundation model trained to predict chromatin state directly from genomic sequence. In contrast to large language models, which are trained in a selfsupervised manner using sequence alone, our model is trained to predict chromatin state across tissues and conditions. Training the model on a large collection of genome-wide experiments including DNA accessibility, transcription factor binding, and histone modifications, provides it with added biological context beyond the sequence itself. We demonstrate that the resulting model is an effective platform for developing accurate models of regulatory activity relevant to gene expression and active enhancers, exhibiting large improvements in speed, accuracy, and interpretability over the complementary approach of fine-tuning DNA language models. D eep -P lant models are available in Arabidopsis and rice, and work well as a building block for sequence modeling in related species such as corn. Together, these results establish supervised, chromatin-informed foundation models as a practical and effective paradigm for regulatory sequence modeling in plants.

Article activity feed