Deep Learning of High-throughput Transcription Factor–DNA Binding Affinity Data: Quantitative Comparison with Pairwise-Additive Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Transcription factors (TFs) regulate gene expression by binding to specific DNA sequences. Widely used models of TF–DNA binding, such as position weight matrices (PWMs) and position-specific affinity matrices (PSAMs), assume binding free energy is the sum of independent base contributions. However, there is ample evidence that non-additive effects significantly influence TF binding. Here, we utilize data from a high-throughput in vitro assay ( ivt FOODIE) to generate genome-scale TF–DNA dissociation constants ( K d ) and systematically evaluate sequence-to-affinity models of increasing complexity. We demonstrate that pairwise additive models exhibit systematic deviations from the measured affinity landscapes. Models incorporating adjacent dinucleotide interactions and deep learning architectures achieve markedly improved agreement with experimental K d values. The magnitude of this non-pairwise-additivity depends strongly on the TF family. In silico mutation screening reveals widespread, TF-specific long-range interposition dependencies, highlighting the role of energetic coupling across distant positions in target recognition. These results provide a quantitative framework for comparing non-pairwise-additive energetic effects across diverse TFs.