Community-Driven Copy Number Variant Discovery at Scale: Results from a Rare Disease Genomics Hackathon
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Purpose
Copy number variants (CNVs) are a major contributor to rare genetic diseases, but their detection and interpretation from short-read genome sequencing (srGS) data remain challenging, especially at scale. Large amounts of existing srGS data remain under-analyzed for clinically relevant CNVs.
Methods
During a collaborative Hackathon, we developed and applied scalable CNV analysis workflows to srGS data from three unsolved, exome-negative, rare disease cohorts: Primary Immunodeficiency (N = 39), Turkish developmental disorders (N = 31), and data from the Genomics Research to Elucidate the Genetics of Rare diseases (GREGoR) (N = 1437). We employed Parliament2 for structural variant (SV) calling, Mosdepth and SLMSuite for read-depth–based quality control and CNV detection, and R Shiny-based visualization tools. We also constructed an SV/CNV variant database with population frequency and pathogenicity annotations, applied DBSCAN clustering for internal allele frequency estimation, and used a 3-way annotation strategy to aid interpretation.
Results
Our pipelines identified high-confidence CNVs and streamlined interpretation across cohorts. Within 2 days, the Hackathon yielded 39 candidate pathogenic SVs. The tools and workflows enabled rapid filtering, prioritization, and visualization of clinically relevant variants.
Conclusion
This community-driven effort demonstrates the feasibility and utility of scalable CNV analysis for accelerating diagnosis and discovery in rare disease cohorts using srGS data.