Anyone can be the best: Impact of diverse methodologies on the evaluation of structural variant callers

Luca Denti
Thomas Krannich
Tomas Vinar
Rayan Chikhi
Paola Bonizzoni
Brona Brejova
Fereydoun Hormozdiari

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Structural variants (SVs) are medium and large-scale genomic alterations that shape phenotypic diversity and disease risk. Numerous methods have been proposed for discovering SVs, however their benchmarking has been inconsistent across studies, often resulting in contradictory findings. One of the main sources of conflicting evaluation re-sults is the lack of consistency in the SV callsets used as ground truth, ranging from curated callsets released by consortia to more recent approaches that construct callsets from high-quality telomere-to-telomere de novo haplotype assemblies. The discrepancies between benchmarks are further compounded by the choice of the reference genome ( GRCh37 , GRCh38 , and T2T-CHM13 ), where using T2T-CHM13 reveals a different deletion/insertion profile, indicating reduced reference bias. We evaluated the performance of several state-of-the-art SV discovery methods from long-read whole-genome sequencing data and observed substantial variation in their performance and rankings, depending on the choice of ground truth, reference genome, and genomic regions used for evaluation. Counter-intuitively, the more complete reference genome T2T-CHM13 does not inherently solve the problem of SV benchmarking; instead it reveals the limitations of each detection method in complex genomic regions. The substantial variation in detection accuracy across different genomic regions calls for additional caution in downstream analyses and in drawing conclusions based on predicted SVs. These findings underscore the complexity of evaluating SV detection methods and highlight the need for careful consideration and, ideally, field-standard best practices when reporting performance metrics.

Version published to 10.1101/2025.08.28.672546 on bioRxiv
Sep 1, 2025

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

This article has 8 authors:
1. Louis-Maël Guéguen
2. Alban Mathieu
3. Simon Pelletier
4. Anthony Woo
5. Namita Misra
6. Magali Moreau
7. Olivier Perin
8. Arnaud Droit
This article has no evaluationsLatest version Jan 29, 2026
Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis

This article has 7 authors:
1. Can Luo
2. Yichen Liu
3. Han Liu
4. Zhenmiao Zhang
5. Lu Zhang
6. Brock Peters
7. Xin Maizie Zhou
This article has no evaluationsLatest version Jan 12, 2026
Benchmarking RNA-seq Tools for Real-World Diagnostic Applications

This article has 15 authors:
1. Sarah Silverstein
2. Kaushik Ganapathy
3. Sandra Donkervoort
4. Veronique Bolduc
5. Ying Hu
6. Justin Moy
7. Prech Uapinyoying
8. Svetlana Gorokhova
9. Vijay Ganesh
10. Ben Weisburd
11. Rotem OrBach
12. A. Reghan Foley
13. Pejman Mohammadi
14. David Adams
15. Carsten Bonnemann
This article has no evaluationsLatest version Jan 29, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis

Benchmarking RNA-seq Tools for Real-World Diagnostic Applications