Predicting peptide aggregation with protein language model embeddings

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Amyloid fibril formation, i.e., aggregation, is associated with multiple diseases and hinders the development of therapeutics. The experimental characterization of aggregating peptides is resource-intensive, limiting the size of labeled datasets. We present a deep-learning model, PALM (Predicting Aggregation with Language Model embeddings), that uses transfer learning to predict aggregation from pretrained protein language model (pLM) embeddings. PALM is trained on WaltzDB-2.0 to classify peptides and identify aggregation-prone regions within a sequence at single-residue resolution. In comparison to existing models, it exhibits strong performance on held-out experimental datasets. We find that PALM fails to identify single mutations that increase the rate of aggregation of amyloid beta peptide; however, training the PALM architecture on a larger dataset, CANYA NNK1-3, substantially improves performance in this task. These results show that transfer learning with pLM embeddings improves performance when training on small datasets, but highlight that challenging tasks, such as predicting the effect of single mutations, require more experimental data.

Article activity feed