A deep learning model captures position-specific effects of plant regulatory sequences and suggests genes under complex regulation

Kevin C. Rockenbach
Silvia F. Zanini
Alison C. Tidy
Richard J. Morris
Rachel Wells
Agnieszka A. Golicz

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Deep neural networks can be trained to predict gene expression directly from genomic sequence, thereby implicitly learning regulatory sequence patterns from scratch, minimizing the bias imposed by prior assumptions. A challenging, yet promising prospect is the extraction of novel insights into gene-regulatory mechanisms, by probing and interpreting such gene expression models. Using a branched convolutional neural network architecture trained on promoter and terminator sequences we predict gene expression for allopolyploid Brassica napus and the closely related model organism Arabidopsis thaliana . We validate the model by comparing predicted and measured expression across ecotypes. We also show that deep learning models can successfully capture the positional binding preferences of some transcription factor families, without having been trained on transcription factor binding data. Furthermore, we show that our model did not only detect local sequence patterns, but was also able to determine their function based on their positional context. We also found that increased prediction error correlated with additional more distal or epigenetic regulatory input. Our results demonstrate that deep learning can be used to understand the regulatory architecture of gene expression in plants. A better understanding of gene regulation in the context of polyploid genomes is of particular economic importance, due to their prevalence among major crops. In the future, we hope that such models may facilitate the targeted engineering of gene regulation in crops.

Version published to 10.1101/2025.08.30.673246 on bioRxiv
Sep 4, 2025

DNABERT2-CAMP: A Hybrid Transformer-CNN Model for E. coli Promoter Recognition

This article has 4 authors:
1. Hua-Lin Xu
2. Xiu-Jun Gong
3. Hua Yu
4. Ying-Kai Wang
This article has no evaluationsLatest version Dec 28, 2025
Convolutional Deep Learning Approach to identify DNA Sequences for Gene Prediction

This article has 2 authors:
1. Jesus Antonio Motta
2. Pedro David Gomez
This article has no evaluationsLatest version Jan 27, 2026
In-Context Learning in Genomic Language Models as a Biological Evaluation Task

This article has 2 authors:
1. Aadit Kapoor
2. Wendy Lee
This article has no evaluationsLatest version Dec 9, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DNABERT2-CAMP: A Hybrid Transformer-CNN Model for E. coli Promoter Recognition

Convolutional Deep Learning Approach to identify DNA Sequences for Gene Prediction

In-Context Learning in Genomic Language Models as a Biological Evaluation Task