A sensitive and accurate framework for population-scale structural variant discovery and genotyping across sequence types
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Structural variation underlies much of the phenotypic and evolutionary diversity. However, accurate discovery and genotyping of structural variants (SVs) at population scale remain challenging due to the varied characteristics of sequencing technologies and the complexity of genome architectures. Here, we introduce PSVGT, a unified framework that integrates short- and long-read data, de novo assembled contigs, and chromosome-level assemblies to enable comprehensive SV detection and genotyping across diploid and polyploid genomes. PSVGT employs an integrated signaling module to extract precise insertion and deletion breakpoints, coupled with the ploidy-aware KLOOK clustering algorithm and local depth-adaptive filtering to resolve multi-allelic events and accommodate the uneven coverage characteristic of complex genomic regions. Benchmarking on simulated and real datasets demonstrates that PSVGT consistently outperforms state-of-the-art tools across sequence types, with advantages particularly in complex genomes and low-coverage long-read data. PSVGT fills a critical gap in scalable SV analysis by leveraging underutilized short-read data and enabling robust characterization of SVs across diverse genome architectures–from diploids to polyploids–thereby facilitating population-scale analyses and pan-genome research.