pLM-SAV: A Δ-Embedding Approach for Predicting Pathogenic Single Amino Acid Variants

Orsolya Gereben
Hedvig Tordai
Lana Khamisi
Erda Qorri
Tamás Hegedűs

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Predicting whether single amino acid variants (SAVs) in proteins lead to pathogenic outcomes is a critical challenge in molecular biology and precision medicine. Experimental determination of all possible mutation effects is infeasible, and while state-of-the-art tools such as AlphaMissense show promise, their diagnostic performance is insufficient and they are often difficult to run locally. We developed pLM-SAV, a simple yet effective predictor that leverages protein language models (pLMs). Δ-embeddings, computed as the difference between wild-type and mutant sequence embeddings, are used as input for a convolutional neural network. To prevent data leakage, we trained our model on a well-characterized, labeled set of Eff10k and evaluated it on a non-homologous subset of ClinVar data. Our results demonstrate that this approach performs exceptionally well on the Eff10k test folds and reasonably on ClinVar test sets. Notably, pLM-SAV excels in resolving ambiguous predictions by AlphaMissense. We also found that an ensemble method, REVEL, outperforms both AlphaMissense and pLM-SAV, thus, we integrated these REVEL- enhanced predictions into our widely used AlphaMissense web application. Our results demonstrate that an SAV predictor trained on labeled data can achieve high predictive performance. Unlike previous methods such as VESPA, pLM-SAV uses no handcrafted features or substitution matrices, relying solely on pLM-derived representations. We anticipate that incorporating delta-embeddings into other mutation effect predictors or mutant structure prediction methods will further enhance their accuracy and utility in diverse biological contexts.

Availability and Implementation

Freely available at https://doi.org/10.5281/zenodo.15502498 and https://alphamissense.hegelab.org .

Version published to 10.1101/2025.05.24.655916 on bioRxiv
May 28, 2025

VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants

This article has 6 authors:
1. Jiawei Wu
2. Marissa Stutzman
3. Michael Muriello
4. Joy Lincoln
5. Donald G. Basel
6. Xiaowu Gai
This article has no evaluationsLatest version Jan 21, 2026
Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Availability and Implementation

Article activity feed

Related articles

VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction