PDS3M: A Self-Supervised State Space Model for Protein-DNA Binding Prediction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate prediction of protein-DNA binding sites remains challenging due to the limitations in capturing sequence context and class imbalance issues. We introduce PDS3M, a novel deep learning model for predicting protein-DNA binding residues. PDS3M integrates a self-supervised learning embedding module with a selective state space model. The self-supervised module, employing a masking strategy, enhances initial feature representation by learning contex-tual relationships within protein sequences. The selective state space model refines this information by combining convolution and a selective scan mechanism , focusing on relevant sequence information at each step. Besides, we use a weighted binary cross-entropy logistic loss function to address the significant data imbalance inherent in protein-DNA binding datasets, giving more weight to positive instances during training. Experimental evaluations on multiple benchmark datasets demonstrate that PDS3M achieves a well-balanced performance and superior prediction results compared to existing methods, as evidenced by higher Matthews Correlation Coefficient values. An extended version, PDS3M+, which uses pre-trained embeddings, achieves an average Matthews Correlation Coefficient improvement of approximately 18% compared to the state-of-the-art method.