HitSV: Maximizing discovery of structural variants across sequencing technologies

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Structural variants (SVs) are a major source of genomic diversity, yet their discovery remains challenging due to repetitive genomic contexts, alignment ambiguity, and the trade-off between sequencing cost and read length. Here we introduce HitSV, which substantially improves SV discovery by implementing repetitiveness and signature density aware breakpoint recognition coupled with precise haplotype-resolved local assembly, thereby enabling base-resolution SV reconstruction and genotyping across various sequencing technologies. HitSV is 12-68% (long-read), 3%-36% (short-read) and 13% (hybrid-sequencing), respectively, more accurate than state-of-the-art SV callers across different coverages. Applying HitSV to the 1KGP Phase 4 cohort, we identified 31.5% more SVs, substantially reshaping allele-frequency landscapes. Notably, analysis of a large Chinese long-read cohort uncovers tandem repeat–mobile element composite arrays as a prevalent and multi-allelic class of complex SVs, highlighting composite repeat architectures as a fundamental hallmark of human genomes.

Article activity feed