HitSV: Maximizing discovery of structural variants across sequencing technologies
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Structural variants (SVs) are a major source of genomic diversity, yet their discovery remains challenging due to repetitive genomic contexts, alignment ambiguity, and the trade-off between sequencing cost and read length. Here we introduce HitSV, which substantially improves SV discovery by implementing repetitiveness and signature density aware breakpoint recognition coupled with precise haplotype-resolved local assembly, thereby enabling base-resolution SV reconstruction and genotyping across various sequencing technologies. HitSV is 12-68% (long-read), 3%-36% (short-read) and 13% (hybrid-sequencing), respectively, more accurate than state-of-the-art SV callers across different coverages. Applying HitSV to the 1KGP Phase 4 cohort, we identified 31.5% more SVs, substantially reshaping allele-frequency landscapes. Notably, analysis of a large Chinese long-read cohort uncovers tandem repeat–mobile element composite arrays as a prevalent and multi-allelic class of complex SVs, highlighting composite repeat architectures as a fundamental hallmark of human genomes.