Accelerated, Accurate, Hybrid Short and Long Reads Alignment and Variant Calling

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Integrating short-read and long-read sequencing technologies has become a promising approach for achieving accurate and comprehensive genomic analysis. While short-read sequencing (Illumina, etc.) offers high base accuracy and cost efficiency, it struggles with structural variation (SV) detection and complex genomic regions. In contrast, long-read sequencing (PacBio HiFi) excels in resolving large SVs and repetitive sequences but is limited by throughput, higher error rates (especially Indels), and sequencing costs. Hybrid approaches may combine these technologies and leverage their complementary strengths and different sources of error to provide higher accuracy, more comprehensive results, and higher throughput by lowering the coverage requirement for the long reads.

Methods

This study benchmarks the DNAscope Hybrid pipeline, a novel integrated alignment and variant calling framework that combines short- and long-read data sequenced from the same sample. We evaluate its performance across multiple human genome reference datasets (HG002–HG004) using the draft Q100 and Genome in a Bottle v4.2.1 benchmarks. The pipeline’s ability to detect small variants (SNPs/Indels), structural variants (SVs), and copy number variations (CNVs) is assessed using data from the Illumina and Pacbio sequencing systems at varying read depths (5x–30x). Benchmark results are compared to DeepVariant.

Results

The DNAscope Hybrid pipeline significantly improves SNP and Indel calling accuracy, particularly in complex genomic regions. At lower long-read depths (e.g., 5x-10x), the hybrid approach outperforms standalone short- or long-read pipelines at full sequencing depths (30x-35x). Additionally, the DNAscope Hybrid outperforms leading open-source tools for SV and CNV detection, enhancing variant discovery in challenging genomic regions. The pipeline also demonstrates clinical utility by identifying disease-associated variants. Moreover, DNAscope Hybrid is highly efficient, achieving less than 90 minutes runtimes at single standard instance.

Conclusion

The DNAscope Hybrid pipeline is a computationally efficient, highly accurate variant calling framework that leverages the advantages of both short- and long-read sequencing. By improving variant detection in challenging genomic regions and offering a robust solution for clinical and large-scale genomic applications, it holds significant promise for genetic disease diagnostics, population-scale studies, and personalized medicine.

Article activity feed