Aardvark: Sifting through differences in a mound of variants
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Variant benchmarking is critical in assessing the accuracy of genomic secondary pipelines. However, traditional benchmarking tools that require exact genotype matches inject biases from variant representation and are ill-suited for tandem repeat or structural variation. We describe Aardvark, a variant benchmarking tool that introduces the basepair score to directly compare haplotype sequences, reducing representation biases while allowing for partial credit scoring. The tool also includes a traditional genotype score and supports separate or joint benchmarking of small variants, tandem repeats, and structural variants (<10 kb). Aardvark accepts standard inputs, runs ≈16x faster than hap.py, and is freely available and open source ( https://github.com/PacificBiosciences/aardvark ).