EPEPDI: prediction of binding free energy changes from missense mutations in double and single-stranded DNA-binding proteins

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Predicting changes in binding free energy due to missense mutations (MMs) in protein-DNA interactions (PDIs) is vital for understanding disease mechanisms and advancing therapeutic strategies. However, many existing models fail to account for the unique characteristics of MMs in double-stranded DNA binding proteins (DSBs) and single-stranded DNA binding proteins (SSBs). To address this, we constructed a comprehensive dataset from diverse sources, clearly distinguishing between DSBs and SSBs. Using sequence-based embeddings from pre-trained protein language models, including ESM2, ProtTrans, and ESM1v, we developed EPEPDI, a deep learning framework that integrates these embeddings through a multi-channel architecture. To refine predictive accuracy, we introduced an information entropy-based algorithm, determining 181 residues as the optimal sequence length where amino acid contributions and entropy dynamics balance. This approach boosts both precision and computational efficiency, enabling scalable analysis of mutation impacts on DNA-binding proteins. Ablation studies validated optimal feature combinations, demonstrating that EPEPDI outperforms existing approaches, achieving an average Pearson correlation coefficient of 0.755 on the MPD276 dataset via ten-fold cross-validation and 0.632 on independent tests for both DSBs and SSBs. This work highlights the importance of distinguishing DSBs and SSBs in PDIs and shows the potential of advanced machine learning in biological research.

Article activity feed