emb2dis: a novel protein disorder prediction tool based on ResNets, dilated convolutions & protein language models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Intrinsically disordered proteins (IDPs) play an important role in a wide range of biological functions and are linked to several diseases. Due to technical difficulties and the high cost of experimental determination of disorder in proteins, combined with the exponential increase of unannotated protein sequences, the development of computational methods for disorder prediction became an active area of research in the last few decades. In this work, we present emb2dis, a deep learning model that uses protein language models (pLMs) to predict disorder from sequence. The emb2dis tool is a pre-trained model that receives as input a protein sequence, calculates its pLM embedding and passes it to a deep learning model. In contrast to existing approaches, emb2dis integrates informative sequence representations with a novel architecture that combines residual networks (ResNets) and dilated convolutions. This design effectively enlarges the receptive field of the convolution operation, enabling the model to better capture an extended context of each amino acid. At the output, emb2dis assigns a disorder propensity score to each residue in the sequence. The model was evaluated on datasets from the latest CAID3 blind benchmark for disorder prediction, where it achieved first place in the Disorder-PDB category, exhibiting strong performance with high AUC and F max scores. Additionally, it ranked among the top ten methods on the Disorder-NOX dataset. We provide a freely available web-demo for emb2dis and a source code repository for local installation.

Weblink for the tool

https://sinc.unl.edu.ar/web-demo/emb2dis/

The importance of the emb2dis tool is that it provides a new deep learning approach and significant improvements in the prediction of protein disorder, with a simple web interface and graphical output detailing per-residue disorder.

Article activity feed