PanVariants: Best Practice for Pangenome-based Variant Calling Pipeline and Framework

Heng Yi
Linqi Wang
Xinrui Chen
Yi Ding
Andrew Carroll
Pi-Chuan Chang
Kishwar Shafin
Lingyun Xu
Xiaojie Zeng
Xia Zhao
Meihua Gong
Xiaofang Wei
Yong Hou
Ming Ni

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Although pangenome references offer richer population diversity compared to linear references, current mainstream pangenome-based variant callers are limited to detecting only known variants stored in the graph. To address this limitation, we developed PanVariants, a novel pipeline designed to improve the detection of both known and novel variants accurately. We systematically evaluated its performance against the traditional linear alignment solution (BWA+GATK/Manta) and the existing pangenome-aware solution (DRAGEN/PanGenie) in three contexts: small variants (SNVs/indels) and structural variants (SVs) accuracy in Genome in a Bottle samples, clinical detection on positive samples, and application in cohort-based joint calling. Results: By integrating k-mer-based and mapping‑based methods, PanVariants significantly reduced variant errors (FPs + FNs), achieving a 73% reduction compared to BWA+GATK and a 45% reduction compared to DRAGEN for SNVs. Retraining the DeepVariant model with high-quality DNBSEQ data further decreased errors by 15%. For SVs detection, PanVariants attained an F1-score of 89.39%, markedly outperforming DRAGEN (68.18%) and BWA+Manta (58.33%), approaching long-read sequencing performance (95.22%). In validation using clinical positive samples, PanVariants successfully detected all expected pathogenic variants while PanGenie failed. In the cohort joint‑calling analysis, PanVariants detected more variants, made fewer Mendelian inheritance errors, and gave better per‑sample accuracy than GATK. Conclusions: PanVariants establishes a robust framework and best-practice pipeline for pangenome-based variant detection, achieving both sensitive novel variant discovery and high accuracy for SNVs, indels and SVs. Our systematic evaluation of optional processing steps and input variables offers practical guidance for users. Validated across diagnostic and population-based applications, our findings strongly support the transition from linear to pangenome references in future genomics.

Version published to 10.64898/2026.04.22.720142 on bioRxiv
Apr 24, 2026

Pan1c : a pipeline to easily build chromosome-level pangenome graphs

This article has 8 authors:
1. Alexis Mergez
2. Martin Racoupeau
3. Philippe Bardou
4. Benjamin Linard
5. Fabrice Legeai
6. Frédéric Choulet
7. Christine Gaspin
8. Christophe Klopp
This article has no evaluationsLatest version Apr 21, 2026
Pansoma, a machine learning tool for identifying somatic variants using pangenome graphs

This article has 6 authors:
1. Jiawei Shen
2. Qichen Fu
3. Juan F. Macias-Velasco
4. Human Pangenome Reference Consortium
5. Daofeng Li
6. Ting Wang
This article has no evaluationsLatest version May 29, 2026
Selecting genomes that matter: haplotype-based prioritization for iterative pangenome expansion

This article has 7 authors:
1. Marina P. Marone
2. Erwang Chen
3. Axel Himmelbach
4. Georg Haberer
5. Manuel Spannagl
6. Nils Stein
7. Martin Mascher
This article has no evaluationsLatest version May 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Pan1c : a pipeline to easily build chromosome-level pangenome graphs

Pansoma, a machine learning tool for identifying somatic variants using pangenome graphs

Selecting genomes that matter: haplotype-based prioritization for iterative pangenome expansion