Current Structural Variant Calling Biases Compromise Clinical Genome Diagnostics
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Structural variants (SVs) account for a substantial proportion of pathogenic mutations in rare diseases, yet their detection remains challenging in clinically relevant coding regions. We systematically benchmarked SV detection across Illumina WGS, Illumina WES, ONT WGS, and PacBio WGS using the HG002 GIAB truth set on GRCh37 and GRCh38. Performance was assessed within three nested interval sets: high-confidence intervals (HCI), a pediatric disorders gene panel (GP), and exons plus UTRs (EX + UTR). To isolate genomic context effects from interval size, we developed a novel simulation framework generating exon-like target sets outside coding regions. Long-read platforms outperformed short-read sequencing across all intervals, with PacBio Pbsv achieving F1 = 0.94 in HCI versus 0.62 for Illumina WGS. SV detection performance consistently dropped in EX + UTR regions compared to simulated counterparts, indicating systematic bias beyond interval size effects. Analysis on GRCh38 revealed that region-dependent performance gaps persisted, although they were less pronounced and predominantly affecting precision. These results demonstrate that SV detection accuracy is both technology- and region-dependent, underscoring the need for interval-specific benchmarking in clinical genomics.