Predicting peptide aggregation with protein language model embeddings

Ethan Eschbach
Kristine Deibler
Deepa Korani
Sebastian Swanson

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Amyloid fibrils, a form of peptide aggregate, are associated with multiple diseases and hinder the development of therapeutics. The experimental characterization of aggregating peptides is resource-intensive and data are scarce, limiting the development of accurate models. We present a deep-learning model, PALM (Predicting Aggregation with Language Model embeddings), which uses transfer learning to predict aggregation from embeddings extracted from a pretrained protein language model (pLMs). PALM is trained on the WaltzDB-2.0 dataset to classify peptides and identify aggregation-prone regions within a sequence at single-residue resolution. Compared to existing models, it exhibits strong performance on held-out experimental datasets. We find that PALM fails to identify single mutations that increase the rate of aggregation of amyloid beta peptide; however, training the PALM architecture on a larger dataset, CANYA NNK1-3, substantially improves performance in this task. These results show that transfer learning with pLM embeddings improves performance when training on small datasets, but highlight that challenging tasks, such as predicting the effect of single mutations, require more experimental data.

Version published to 10.1101/2025.09.26.678773 on bioRxiv
Sep 29, 2025

A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
Discovery of β-Sheet Peptide Assembly Codes via an Experimentally Validated Predictive Computational Platform

This article has 4 authors:
1. Wei Han
2. Hang Zheng
3. Ke Huang
4. Chi-Sing Lee
This article has no evaluationsLatest version Jan 14, 2026
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Survey on Efficient Protein Language Models

Discovery of β-Sheet Peptide Assembly Codes via an Experimentally Validated Predictive Computational Platform

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction