PDS3M: A Self-Supervised State Space Model for Protein-DNA Binding Prediction

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate prediction of protein-DNA binding sites remains challenging due to the limitations in capturing sequence context and class imbalance issues. We introduce PDS3M, a novel deep learning model for predicting protein-DNA binding residues. PDS3M integrates a self-supervised learning embedding module with a selective state space model. The self-supervised module, employing a masking strategy, enhances initial feature representation by learning contex-tual relationships within protein sequences. The selective state space model refines this information by combining convolution and a selective scan mechanism , focusing on relevant sequence information at each step. Besides, we use a weighted binary cross-entropy logistic loss function to address the significant data imbalance inherent in protein-DNA binding datasets, giving more weight to positive instances during training. Experimental evaluations on multiple benchmark datasets demonstrate that PDS3M achieves a well-balanced performance and superior prediction results compared to existing methods, as evidenced by higher Matthews Correlation Coefficient values. An extended version, PDS3M+, which uses pre-trained embeddings, achieves an average Matthews Correlation Coefficient improvement of approximately 18% compared to the state-of-the-art method.

Article activity feed