Learning sequence to predict gain- or loss-of-function variants

Doyeon Ha
Sungnam Kim
Kisang Kwon
Wonseok Chung
Joohyun Han

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

A clear understanding of mutational effects can advance genetics and biomedical research by providing valuable insights into gene functions, disease mechanisms, and therapeutic approaches. However, methods to determine the pathogenicity of genetic variants are limited by the absence of information on the direction of mutational effects. Here, we present ClearVariant, a deep learning system to classify pathogenic variants into gain- or loss-of-function, achieving state-of-the-art performance validated with data from ClinVar and Human Gene Mutation Database (HGMD). The model contains protein language models (PLMs) for training mutated sequences alongside their reference counterparts, showing similar predicted outcomes when a residue changed to another amino acid belonging to the same property group. We evaluated its ability to learn the protein language by observing high attention scores on coevolutionary relationships. To support advancements in biomedicine, we provide a database of pathogenic human missense variants labelled with their predicted mutational effects.

Version published to 10.21203/rs.3.rs-6705195/v1 on Research Square
Jun 6, 2025

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026
VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants

This article has 6 authors:
1. Jiawei Wu
2. Marissa Stutzman
3. Michael Muriello
4. Joy Lincoln
5. Donald G. Basel
6. Xiaowu Gai
This article has no evaluationsLatest version Jan 21, 2026
GenBlosum: On Determining Whether Cancer Mutations Are Functional or Random

This article has 2 authors:
1. Alejandro Leyva
2. Muhammad Khalid Khan Niazi
This article has no evaluationsLatest version Dec 15, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants

GenBlosum: On Determining Whether Cancer Mutations Are Functional or Random