ESMDisPred: A Structure-Aware CNN-Transformer Architecture for Intrinsically Disordered Protein Prediction

Md Wasi Ul Kabir
Ayon Dey
Farzeen Nafees
Md Tamjidul Hoque

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

Intrinsically disordered proteins (IDPs) lack stable three-dimensional structures, yet play vital roles in key biological processes, including signaling, transcription regulation, and molecular scaffolding. Their structural flexibility presents significant challenges for experimental characterization and contributes to diseases such as cancer and neurodegenerative disorders. Accurate computational prediction of IDPs is important for advancing research and drug discovery, structural biology, and protein engineering. In this study, we introduce ESMDisPred, a novel structure-aware disorder predictor that builds on the representational power of Evolutionary Scale Modeling-2 (ESM2) protein language models. ESMDisPred integrates sequence embeddings with structural information from the Protein Data Bank (PDB) to deliver state-of-the-art prediction accuracy. Model performance is further enhanced through feature engineering strategies, including terminal residue encoding, statistical summarization, and sliding-window analysis. To capture both local sequence motifs and long-range dependencies, we designed a hybrid CNN-Transformer architecture that balances convolutional efficiency with the representational power of self-attention. On CAID3 benchmarks, our latest model achieves ROC-AUC 0.895, AP 0.778, and a max F1 of 0.759, outperforming recent methods. Our results highlight the importance of integrating protein language model embeddings with explicit structural information for improved disorder prediction.

Arcadia Science
Mar 10, 2026

To reduce class imbalance, we exclude these longer sequences from the training dataset.

Downsampling/excluding a minority class is an interesting decision. Why not include some of them, use some flavor of stratified sampling/curriculum learning/train a specialized subnetwork/set of heads on the larger sequences with a reasonable split? How does the model generalize/perform on larger sequences?

Read the original source
Arcadia Science
Mar 10, 2026

In our earlier work, we introduced DisPredict3.0, the most recent iteration of the DisPredict series, which integrates evolutionary representations derived from protein language models to improve the prediction of intrinsically disordered regions (IDRs) [5]. This approach achieved the top ranking on the Disorder NOX dataset in CAID2. Building on this foundation, we now present ESMDisPred, a structure-aware disordered protein predictor that incorporates embeddings from the Evolutionary Scale Modeling-2 (ESM2) language model [3]. ESM2 is considered the SOTA language model and has demonstrated exemplary performance in protein structure prediction (ESMFold)

This is interesting. Evolutionary context can be really informative?

Read the original source
Version published to 10.64898/2026.01.22.701204 on bioRxiv
Jan 24, 2026

Extending Conformational Ensemble Prediction to Multidomain Proteins and Protein Complex

This article has 5 authors:
1. Haifeng Chen
2. Junjie Zhu
3. Sören von Bülow
4. Hongyi Liu
5. Kresten Lindorff-Larsen
This article has no evaluationsLatest version Mar 12, 2026
Deep Learning Approaches for Accurate RNA 3D Structure Prediction from Primary Sequences

This article has 1 author:
1. Nnaemeka Kingsley Ugwumba
This article has no evaluationsLatest version Jan 29, 2026
DeepCas12a: A hybrid deep learning framework for accurate Cas12a efficiency prediction from sequence and epigenetic information

This article has 6 authors:
1. Yiming Shi
2. Junkai Yin
3. Shurui Ning
4. Jinling Yuan
5. Degang Yang
6. Guohui Chuai
This article has no evaluationsLatest version Feb 9, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Extending Conformational Ensemble Prediction to Multidomain Proteins and Protein Complex

Deep Learning Approaches for Accurate RNA 3D Structure Prediction from Primary Sequences

DeepCas12a: A hybrid deep learning framework for accurate Cas12a efficiency prediction from sequence and epigenetic information