Expanding the DNA Motif Lexicon of the Transcriptional Regulatory Code
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Transcriptional regulatory sequences in metazoans contain intricate combinations of transcription factor (TF) motifs. Stereospecific arrangements of simple motifs constitute composite elements (CEs) that enhance DNA-protein interaction specificity and enable combinatorial regulatory logic. Despite their importance, CEs remain underexplored. We advance CE discovery and functional characterization by developing an integrated framework that combines computational prediction, experimental testing and deep learning. The extended TF motif catalog comprises both synergistic and counteracting CEs, which are supported by evidence of TF binding in vivo and in vitro . A deep learning model GRACE trained on customized massively parallel reporter assays learns the lexicon of CEs at single-nucleotide resolution. Comparative analysis with a neural network model trained on chromatin accessibility demonstrates striking convergence and distinctions within the expanded regulatory lexicon, enabling joint predictions of motif contributions and the impact of variants on chromatin structure and transcriptional activity in diverse cellular contexts.