Disease-specific variant pathogenicity prediction using multimodal biomedical language models

Yilin Liu
David N. Cooper
Haiyuan Yu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Missense variants play a key role in the diagnosis of genetic disorders and in disease risk prediction. Existing methods focus primarily on the prediction of variant effects in terms of their deleteriousness, without taking into account the disease-specific context, and are therefore limited in terms of their utility in real-world diagnosis and decision making. Here, we introduce di sease-specific va riant pathogenicity prediction (DIVA), a novel deep learning framework that directly predicts specific disease types alongside the probability of deleteriousness for missense variants. Our approach integrates information from two different modalities – protein sequence and disease-related textual annotations – encoded using two pre-trained language models and optimized within a contrastive learning paradigm designed to align variants with relevant diseases in the learned representation space. Our results demonstrate that DIVA outperforms baselines and provides accurate disease predictions with high relevance to clinically curated disease annotations for missense variants. Variant deleteriousness prediction is enhanced by incorporating AlphaMissense scores through learnable weights derived from protein function annotations, which additionally boosts DIVA ’ s ability to accurately classify deleterious variants. Our work provides new insights into variant pathogenicity prediction with awareness of disease specificity, addressing a hitherto unmet need in relation to clinical variant interpretation.

Version published to 10.1101/2025.09.09.675184 on bioRxiv
Sep 15, 2025

VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants

This article has 6 authors:
1. Jiawei Wu
2. Marissa Stutzman
3. Michael Muriello
4. Joy Lincoln
5. Donald G. Basel
6. Xiaowu Gai
This article has no evaluationsLatest version Jan 21, 2026
Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026
Personalized Disease Prediction Framework based on Genomic Variants and Disease Histories using Deep Embeddings and Alignment-based Process Conformance Checking

This article has 4 authors:
1. Daewoo Pak
2. Hyunwoo Jo
3. Seon Kim
4. Jongchan Kim
This article has no evaluationsLatest version Jan 20, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

Personalized Disease Prediction Framework based on Genomic Variants and Disease Histories using Deep Embeddings and Alignment-based Process Conformance Checking