DNA Conformational Flexibility Descriptors Improve Transcription Factor Binding Prediction Across the Protein Families
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Precise binding of transcription factors (TFs) to specific DNA sequences is fundamental to gene regulation, yet the molecular principles underpinning TF–DNA specificity remain incompletely understood. While nucleotide sequence and DNA shape are known determinants of TF binding, the role of DNA flexibility encompassing axial, torsional, and stretching dynamics— remains largely unexplored, particularly across diverse TF families. Here, we systematically integrate experimentally and computationally derived DNA flexibility descriptors into predictive models of TF–DNA binding specificity. Through extensive analyses of large-scale in vitro datasets from HT-SELEX, SELEX-Seq, protein binding microarrays encompassing mam-malian and Drosophila TFs, we demonstrate that flexibility-augmented models consistently outperform sequence based models, and DNA shape augmented models to an extent. These improvements are robust across diverse experimental platforms, and scale of the datasets, underscoring the importance of DNA conformational dynamics in indirect readout. Quantitative analyses of position-specific flexibility contributions reveal distinct “flexibility hotspots” within transcription factor binding sites and their flanking regions. This is exemplified by structural insights into the homeodomain TF MSX1, where localized DNA bendability directly correlates with enhanced binding affinity and precise recognition specificity. Finally, leveraging in vivo ChIP-Seq and DNase-Seq data from ENCODE, we further validate that DNA flexibility substantially enhances the identification of functional TF binding sites across various TF families and cellular contexts. Collectively, current findings substantiate DNA flexibility as a fundamental element of the cis -regulatory code and significantly advancing predictive frameworks of gene regulatory networks.