AllerTrans: a deep learning method for predicting the allergenicity of protein sequences

Faezeh Sarlakifar
Hamed Malek
Najaf Allahyari Fard

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Allergens are a major concern in determining protein safety, especially with the growing use of recombinant proteins in new medical products. These proteins require a careful allergenicity assessment to guarantee their safety. However, traditional laboratory tests for allergenicity are expensive and time-consuming. To address this challenge, bioinformatics offers efficient and cost-effective alternatives for predicting protein allergenicity. Deep learning models offer a promising solution for this purpose. Recently, with the emergence of protein language models(pLMs), high-quality and impactful feature vectors can be extracted from protein sequences using these specialized language models. Although different computational methods can be effective individually, combining them could improve the prediction results. Given this hypothesis, can we develop a more powerful approach than existing methods to predict protein allergenicity? In this study, we developed an enhanced deep learning model to predict the potential allergenicity of proteins based on their primary structure represented as protein sequences. In simple terms, this model classifies protein sequences into allergenic or non-allergenic classes. Our approach utilizes two pLMs to extract distinct feature vectors for each sequence, which are then fed into a deep neural network (DNN) model for classification. Combining these feature vectors improves the results. Finally, we integrated our top-performing models using ensemble modeling techniques. This approach could balance the model’s sensitivity and specificity. Our proposed model demonstrates an improvement compared to existing models, achieving a sensitivity of 97.91%, a specificity of 97.69%, an accuracy of 97.80%, and an area under the receiver operating characteristic curve of 99% using the standard 5-fold cross-validation. The AllerTrans model has been deployed as a web-based prediction tool and is publicly accessible at: https://huggingface.co/spaces/sfaezella/AllerTrans.

Version published to 10.1093/biomethods/bpaf040
Jan 1, 2025
Version published to 10.1101/2024.08.09.607419 on bioRxiv
Aug 10, 2024

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025
Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

This article has 13 authors:
1. Peilin Xie
2. Xingchen Liu
3. Lantian Yao
4. Zhihao Zhao
5. Anming Yang
6. Jiahui Guan
7. Zijun Jiao
8. Zhihong Liu
9. Junwen Wang
10. Tzong-Yi Lee
11. Zigang Li
12. Bingyu Cui
13. Ying-Chih Chiang
This article has no evaluationsLatest version Dec 11, 2025
Raman Spectroscopy of Protein-Polysaccharide Conjugates: A Comparative Study of Tree-Based Ensemble Models

This article has 3 authors:
1. Oksana A. Mayorova
2. Mariia S. Saveleva
3. Ekaterina S. Prikhozhdenko
This article has no evaluationsLatest version Dec 30, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

Raman Spectroscopy of Protein-Polysaccharide Conjugates: A Comparative Study of Tree-Based Ensemble Models