Variantscape: Using Large Language Models to Build a Comprehensive Landscape of Cancer Variants for Precision Oncology

Marie Wosny
Maximilian Boesch
Tobias Peres
Thibault Niederhauser
Martin Früh
Christian Rothermundt
Janna Hastings

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Precision oncology depends on accurate interpretation of molecular variants, yet novel insights are often buried in unstructured literature, described using heterogeneous nomenclature. To address this, we developed “Variantscape ,” an automated, large-scale pipeline and open-access web tool that integrates natural language processing and large language models to explore variant-cancer-treatment co-associations. Of over 2.7 million titles and abstracts processed, 7,524 mention all three entities, cancers, spanning 4,029 unique variants, 98 cancer types, and 377 treatments. Co-occurrence and network analyses revealed 15,577 significant co-associations within a graph comprising 4,504 nodes and 48,470 edges. Canonical variants in common cancers, such as BRAF V600E, had high-confidence treatment associations, while some rare variants showed strong literature-derived signals. By automating discovery and co-association detection, “Variantscape” offers a systematic overview of the variant landscape in the literature, enabling scalable insight generation that support hypothesis generation, uncover underrecognized connections, reveal novel applications of existing therapies, and advance precision oncology.

Version published to 10.21203/rs.3.rs-6614711/v1 on Research Square
May 29, 2025

VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants

This article has 6 authors:
1. Jiawei Wu
2. Marissa Stutzman
3. Michael Muriello
4. Joy Lincoln
5. Donald G. Basel
6. Xiaowu Gai
This article has no evaluationsLatest version Jan 21, 2026
Predicting gene expression from whole slide images in prostate cancer using deep learning

This article has 14 authors:
1. Anxuan Han
2. Bo Li
3. Chui Yan Mah
4. Jessica Logan
5. Yanan Wang
6. Ning Liu
7. Feargal Ryan
8. David Lynn
9. Darren Foreman
10. John O’Leary
11. Douglas Brooks
12. Jose Polo
13. Lisa Butler
14. Fuyi Li
This article has no evaluationsLatest version Feb 4, 2026
Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants

Predicting gene expression from whole slide images in prostate cancer using deep learning

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods