SaVor - A Reproducible Structural Variant Calling and Benchmarking Platform from Short-Read Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Structural variations (SVs) are differences in genomic regions that are larger than 1 kilobase-pair (Kbp) between individuals, and can arise from errant DNA repair mechanisms, whole genome duplications, and transposable element activity across the genome. Recent advances, optimizations, and cost reductions in next generation sequencing technologies have facilitated the exponential increase in the amount of available short read genomic data. Here we present SaVor , a flexible, reproducible SV calling workflow that accepts single or multi-lane short-read paired-end Illumina sequence data, or BAM files as input to generate a consensus SV call-set based on user-provided merge parameters. We tested SaVor on 1,165 Arabidopsis thaliana whole genome sequences and benchmarked its performance on a set of SVs derived from the same accessions using Lumpy . Intersection calls i.e. SVs supported by 3 SV callers showed the highest precision (>0.91) while union calls supported by at least 1 caller showed the highest recall (>0.88). We found that the former suffers from decreased recall (<0.51) and the latter decreased precision (<0.57). Depending on the merge strategy, trade-offs in recall and precision need to be considered for downstream analyses of SV call-sets from short-read data. SaVor is an open-source Snakemake pipeline and is available on GitHub at https://github.com/ChabbyTMD/SaVor

Article activity feed