SDFA: A Standardized Decomposition Format and Toolkit for Efficient Analysis of Structural Variants in Large-scale Population Genomic Studies
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Structural variants (SVs) contribute significantly to genetic diversity yet present computational challenges during analysis. We introduce SDFA, a standardized decomposition format and toolkit for efficient analysis of SVs in large-scale population genomics. SDFA efficiently stores and retrieves all SV types while providing algorithms for consistent SV merging, memory-efficient annotation, and precise gene feature annotation across large cohorts. SDFA outperforms existing tools, achieving at least 17.64 times faster merging and 120.93 times faster annotation, and uniquely handles complex SVs. We validate SDFA on 895,054 SVs from 150,119 individuals in the UK Biobank dataset.