Exploring Protein-DNA Binding Residue Prediction and Consistent Interpretability Analysis Using Deep Learning

Yufan Liu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Accurately identifying DNA-binding residues is a crucial step in developing computational tools to model DNA-protein binding properties, which is essential for binding pocket discovery and related drug design. Although several tools have been developed to predict DNA-binding residues based on protein sequences and structures, their performance remains limited, and proteins with crystal structures still represent only a small fraction of DNA-binding proteins. Additionally, the process of extracting handcrafted features for protein representation is labor-intensive. In this study, we combined the strengths of pre-trained protein language models and attention mechanisms to propose a sequence-based method: an attention-based deep learning approach for accurately predicting DNA-binding residues, incorporating a contrastive learning module. Our method outperformed all other sequence-based models across two prevalent benchmark datasets. Furthermore, we developed a structure-based graph neural network (GNN) model to demonstrate the impact of the contrastive module. A common limitation of existing models is their lack of interpretability, which hinders our ability to understand what these models have learned. To address this, we introduced a novel perspective for interpreting our sequence-based model by analyzing the consistency between attention scores and the edge weights generated by the GNN model. Interestingly, our results show that large-scale pre-trained protein language models, together with attention mechanisms, can effectively capture structural information solely from protein sequence inputs.

Version published to 10.1101/2024.10.12.613667 on bioRxiv
Oct 14, 2024

Convolutional Deep Learning Approach to identify DNA Sequences for Gene Prediction

This article has 2 authors:
1. Jesus Antonio Motta
2. Pedro David Gomez
This article has no evaluationsLatest version Jan 27, 2026
Deep Learning Approaches for Accurate RNA 3D Structure Prediction from Primary Sequences

This article has 1 author:
1. Nnaemeka Kingsley Ugwumba
This article has no evaluationsLatest version Jan 29, 2026
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Convolutional Deep Learning Approach to identify DNA Sequences for Gene Prediction

Deep Learning Approaches for Accurate RNA 3D Structure Prediction from Primary Sequences

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction