Accelerated and High-Accuracy Variant Calling on Oxford Nanopore Technologies Sequencing Data with the Sentieon DNAscope LongRead and Hybrid Pipelines

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Oxford Nanopore Technologies (ONT) has emerged as a clinically meaningful long-read sequencing platform, enabled by recent improvements in read accuracy and throughput. However, variant calling on ONT data remains challenging due to platform-specific error profiles, particularly indels in homopolymer regions. Here, we introduce description and comprehensive benchmarks of the Sentieon DNAscope LongRead and DNAscope Hybrid pipelines on ONT whole-genome datasets, demonstrating high accuracy and substantially reduced computational requirements across SNP, Indel, and Structural Variant detection.

Benchmarked against multiple truth sets including GIAB v4.2.1, CMRG, and T2T-Q100, DNAscope LongRead reduced SNP errors by ∼50% relative to Clair3 across five reference samples and showed consistently higher accuracy in all non-long-homopolymer stratifications. Integration of ONT and Illumina short reads via DNAscope Hybrid further improved variant detection: with low depth ONT + Illumina datasets, the pipeline achieved F1 scores of 0.9992 (SNPs) and 0.9979 (Indels), outperforming alternative pipelines. In challenging benchmarks such as T2T-Q100 and CMRG genes, DS-Hybrid reduced SNP and Indel errors significantly, compared to next-best methods. For structural variants, DNAscope pipelines outperformed alternative (e.g. Sniffles2), highlighting the advantages of haplotype-resolved SV detection method.

Despite its accuracy, the Sentieon software suite remains computationally efficient: complete ONT long read FASTQ-to-VCF analysis completed within ∼190 minutes on a 120-vCPU Azure instance, at costs less than $5 USD.

Overall, the DNAscope LongRead and Hybrid pipelines deliver fast, accurate, and scalable germline variant calling for ONT datasets, providing a practical solution for comprehensive whole-genome analysis in both research and clinical genomics.

Article activity feed