Modelling genetic variation effects in plant gene regulatory networks using transfer learning on genomic and transcription factor binding data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The sequence-specific recognition of cis -regulatory elements (CRE) in non-coding DNA by transcription factors (TF) is a crucial step in propagating genotype information to plant phenotype. Yet, our understanding how genetic variation of CREs affects the target gene activity remains limited due to the high diversity of regulatory elements and the conditional nature of their interactions. Here, we address this challenge using an explainable AI approach. We develop and implement a multi-label deep learning model, trained on extensive DNA-binding data resources existing for Arabidopsis thaliana , to systematically capture how DNA sequence features, their context, and syntax influence transcription factor occupancy across the genome. Once trained, the model is applied to new condition- and genotype-specific scenarios, successfully annotating cistrome-wide TF-binding sites in their native chromatin context, and uncovering condition-specific regulatory syntax and respective gene regulatory modules. Further, by integrating large-scale genomic and GWAS data from Arabidopsis , our approach provides prediction of differential TF-binding and annotation of regulatory gene variants within known quantitative trait loci, thereby establishing a direct link between cis -regulatory variation and phenotypic outcomes. Finally, applying our model in a non-specific protein - DNA interaction assay on Zea mays under control and heat-stress conditions successfully demonstrates its potential to detect and characterize condition-responsive TF binding in phylogenetically distant crops.