Finding an optimal sequencing strategy to detect short and long genetic variants in a human genome

Robert J. M. Eveleigh
Sarah J. Reiling
J. Hector Galvez
Mathieu Bourgey
Jiannis Ragoussis
Guillaume Bourque

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Advances in DNA sequencing have transformed genomics, enabling comprehensive insights into human genetic variation. While short-read sequencing (SRS) remains dominant due to its high accuracy and affordability, its limitations in complex genomic regions have spurred the adoption of long-read sequencing (LRS) platforms, such as those from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT). Despite these advances, there is still a lack of systematic, large-scale benchmarking of variant calling performance across diverse platforms, variant types, sequencing depths, and genomic contexts. Here, we present a comprehensive benchmark of sequencing technologies and variant calling algorithms, evaluating their performance in detecting single nucleotide polymorphisms (SNPs), insertions/deletions (indels), and structural variants (SVs). We show that while SRS combined with DeepVariant or DRAGEN offers excellent small variant detection in well-mapped regions, LRS technologies significantly outperform SRS in complex regions and SV detection. PacBio achieves high SNP and smaller SV accuracy even at moderate coverage, while ONT excels in detecting large SVs. The SV callers Dysgu and SVIM emerged as top performers across LRS datasets. Our results highlight that no single platform is optimal for all variant types or regions: SRS remains optimal for high-throughput small variant detection in accessible regions, whereas LRS is critical for capturing SVs and resolving difficult-to-map loci. These findings offer practical guidance for selecting sequencing technologies, coverage and variant calling strategies tailored to specific research or clinical goals, contributing to more accurate and cost-effective genomic analyses.

Highlights

● State-of-the-art short variant calling algorithms are highly comparable but focus on precision and sensitivity differently

● Short-read technologies outperform long-read technologies for small SNP and InDels, with the exception of difficult-to-map variants.

● To capture the majority of variants, a minimum coverage of 15x for PacBio, 20x for SRS, or 30x for ONT is required. However, optimal coverage depends on zygosity, variant type, and the region of interest.

● Long-read technologies outperform short-read technologies for all validation sets tested for deletions and insertions in all size categories.

Version published to 10.1101/2025.05.30.656631 on bioRxiv
May 30, 2025

Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis

This article has 7 authors:
1. Can Luo
2. Yichen Liu
3. Han Liu
4. Zhenmiao Zhang
5. Lu Zhang
6. Brock Peters
7. Xin Maizie Zhou
This article has no evaluationsLatest version Jan 12, 2026
Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

This article has 7 authors:
1. Grazia Visci
2. Elisabetta Notario
3. Giuseppe Defazio
4. Mariano Francesco Caratozzolo
5. Bruno Fosso
6. Marinella Marzano
7. Graziano Pesole
This article has no evaluationsLatest version Jan 30, 2026
Understanding Pathways in Bioinformatics, Genomics, and Health Applications

This article has 1 author:
1. Diptarup Mallick
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Highlights

Article activity feed

Related articles

Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

Understanding Pathways in Bioinformatics, Genomics, and Health Applications