Accurate detection of pathogenic structural variants guided by multi-platform comparison

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Structural variants (SVs) are a common cause of human diseases and greatly contribute to inter-individual variability. Their detection represents a significant challenge due to their diversity, size, and enrichment in repetitive regions. With the use of high quality long-read technologies, the majority of these challenges can now be mitigated. However, many downstream applications ranging from clinical diagnostics and genome-wide association studies to large-scale aggregation of population variants continue to rely on short-read-sequencing data due to its high through-put and cost-effectiveness. Thus, the challenges of short-read sequencing SV detection remain a constant and relevant obstacle. We created dicast , a machine-learning method to improve the identification of true-positive SVs from short-read sequencing using genomic context and alignment features. Dicast is driven by a novel and comprehensive benchmark call-set created through the combination of several sequencing technologies and rigorous manual curation. This benchmark set served as the basis for a systematic evaluation of five sequencing platforms and fifteen SV detection methods across different SV classes, sizes, and genomic contexts, enabling us to quantify the strengths and weaknesses of each and inform the training of our model. Leveraging these insights, dicast outperforms state-of-the-art short-read-based callers and consensus approaches, identifying considerably more true-positive variants while maintaining a high precision. We also demonstrate the method’s applicability in diagnostic scenarios using putative pathogenic candidates in a limb malformation cohort, as well as known pathogenic variants in atrial fibrillation and neuromuscular disease cohorts. Dicast identifies all known pathogenic variants and 20% more manually confirmed candidate deletions than a consensus approach, and therefore can be used to reliably reduce time-consuming manual inspection in diagnostics.

Graphical Abstract

Article activity feed