Capturing the Mutational Dynamics of SARS-CoV-2 with Graphs

Badhan Das
Lenwood S. Heath

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The rapid evolution of SARS-CoV-2 presents significant challenges for modeling viral dynamics, driven by lineage diversification and region-specific mutation patterns. While phylogenetic trees are traditionally used for evolutionary inference, the massive volume of SARS-CoV-2 genomic data, with many similar sequences and few distinguishing mutations, poses computational and methodological limitations. The quasispecies theory instead models viral evolution as a cloud of mutants, motivating a graph-based representation that better captures the complexity of mutational events. Geographic variation adds another critical layer to this complexity. Mutation trends often differ across regions due to local transmission dynamics, host population structures, and selective pressures. In this study, we present the Mutation Learning Graph (MLG), a directed graph framework that organizes SARS-CoV-2 variants based on their cumulative mutation profiles relative to the reference genome (NC_045512.2), thereby capturing the dynamics of mutation propagation. This structure captures fine-grained mutational transitions and encodes plausible evolutionary relationships among variants. To construct these graphs, we introduce an alignment-aware mutation profiling method and a novel ANCESTOR JOINING algorithm, which incorporates ancestral variants as inferred intermediate nodes to connect observed genomes through biologically coherent mutational paths. We generate MLG datasets for ten geographically and epidemiologically diverse regions and benchmark them on two graph-based tasks: node-level lineage classification and edge-level mutational transition prediction. Using baseline graph neural network architectures (GCN, GraphSAGE, GAT, GGNN, VGAE), we demonstrate how mutation-centric graph structures expose key biological challenges, such as lineage imbalance and location-specific mutation spectra. For node classification, GraphSAGE and GGNN consistently achieved high accuracy (up to 0.96) and AUROC (up to 0.98). In contrast, VGAE and GraphSAGE led the way in link prediction, with AUPRCs of up to 0.96. These results highlight the effectiveness of MLG for capturing biologically meaningful mutation patterns and underscore the importance of localized, mutation-aware modeling for predicting viral mutations and future variant emergence.

Version published to 10.1101/2025.09.02.673804 on bioRxiv
Sep 3, 2025

Dengue Virus Type 2: Global Epidemiology, Molecular Evolution, and Immune Response Insights

This article has 5 authors:
1. Qun Chen
2. Peipei Ye
3. Mengye Ma
4. Zhu Chen
5. Liming Jiang
This article has no evaluationsLatest version Jan 30, 2026
Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary

This article has 1 author:
1. Marvin I. De los Santos
This article has no evaluationsLatest version Dec 22, 2025
Parallel adaptation and cryptic global expansion of Mycobacterium tuberculosis Lineage 3

This article has 5 authors:
1. Chendi Zhu
2. Zhaojun Wu
3. Mingxing Ni
4. Zhuofan Huang
5. Wei-Min Li
This article has no evaluationsLatest version Jan 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Dengue Virus Type 2: Global Epidemiology, Molecular Evolution, and Immune Response Insights

Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary

Parallel adaptation and cryptic global expansion of Mycobacterium tuberculosis Lineage 3