Improving interpretability of transcription factor binding models with DNA shape features

Ryan L. Keivanfar
Forest Yang
Katherine S. Pollard
Nilah M. Ioannidis

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Deep learning models in genomics that predict molecular phenotypes from DNA sequence traditionally focus on one-hot encoded representations. Here, we develop a novel model that extends this approach by incorporating DNA structural attributes indicative of local shape alongside canonical sequence inputs. This augmentation provides an additional axis for model interpretability and aids in identifying regulatory patterns not apparent from sequence alone. Applying this approach to prediction of transcription factor binding (ChIP-seq) demonstrates that combining sequence and structural DNA data can improve the identification of regulatory elements to provide a more nuanced understanding of genomic function and regulation.

Version published to 10.1101/2025.04.01.646034 on bioRxiv
Apr 3, 2025

DNABERT2-CAMP: A Hybrid Transformer-CNN Model for E. coli Promoter Recognition

This article has 4 authors:
1. Hua-Lin Xu
2. Xiu-Jun Gong
3. Hua Yu
4. Ying-Kai Wang
This article has no evaluationsLatest version Dec 28, 2025
Explicit Dynamic Cross-Strand Interactions for DNA Sequence Language Modeling

This article has 12 authors:
1. Xiao Luo
2. Cheng Yang
3. Yuansheng Liu
4. Lei Ling
5. Fengxin Li
6. Changjian Chen
7. Long Wang
8. Feng Yu
9. Liang Qiao
10. Xiangxiang Zeng
11. Kenli Li
12. Alexander Schönhuth
This article has no evaluationsLatest version Jan 8, 2026
Benchmarking Genomic Foundation Models for Gene Fusion Detection from DNA Sequences

This article has 5 authors:
1. Radim Krupička
2. Mariana Komárková
3. Bohuslav Dvorský
4. Kateřina Kollinová
5. Ondřej Klempíř
This article has no evaluationsLatest version Dec 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DNABERT2-CAMP: A Hybrid Transformer-CNN Model for E. coli Promoter Recognition

Explicit Dynamic Cross-Strand Interactions for DNA Sequence Language Modeling

Benchmarking Genomic Foundation Models for Gene Fusion Detection from DNA Sequences