Rovaca: A highly optimized variant caller for rapid and robust germline DNA analysis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The Genome Analysis Toolkit (GATK) HaplotypeCaller is the gold standard for germline variant detection, but its computational intensity presents a significant bottleneck in large-scale genomic analyses. While accelerated solutions exist, they often require specialized hardware or are proprietary, limiting accessibility.
We present Rovaca, a novel, pure software variant caller implemented in C++ that dramatically accelerates germline analysis on standard CPU hardware. By preserving the core mathematical models of GATK HaplotypeCaller, Rovaca achieves nearly identical output, ensuring high concordance and facilitating adoption. Performance gains are achieved through a sophisticated producer-consumer multi-threading architecture and targeted micro-architectural optimizations of the PairHMM algorithm using AVX-512 instructions. Benchmarks on standard GIAB datasets show Rovaca is 57-76× faster for Whole-Genome Sequencing and 30× faster for Whole-Exome Sequencing compared to GATK. The F1-scores for both SNPs and INDELs consistently differ by less than 0.01%, demonstrating exceptional accuracy. Rovaca is a deterministic, robust, and cost-efficient tool that removes the computational barrier for high-throughput germline variant analysis.
Rovaca is open-source and available at https://github.com/ZephyRoy/Rovaca .