Community-Driven Copy Number Variant Discovery at Scale: Results from a Rare Disease Genomics Hackathon

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Purpose

Copy number variants (CNVs) are a major contributor to rare genetic diseases, but their detection and interpretation from short-read genome sequencing (srGS) data remain challenging, especially at scale. Large amounts of existing srGS data remain under-analyzed for clinically relevant CNVs.

Methods

During a collaborative Hackathon, we developed and applied scalable CNV analysis workflows to srGS data from three unsolved, exome-negative, rare disease cohorts: Primary Immunodeficiency (N = 39), Turkish developmental disorders (N = 31), and data from the Genomics Research to Elucidate the Genetics of Rare diseases (GREGoR) (N = 1437). We employed Parliament2 for structural variant (SV) calling, Mosdepth and SLMSuite for read-depth–based quality control and CNV detection, and R Shiny-based visualization tools. We also constructed an SV/CNV variant database with population frequency and pathogenicity annotations, applied DBSCAN clustering for internal allele frequency estimation, and used a 3-way annotation strategy to aid interpretation.

Results

Our pipelines identified high-confidence CNVs and streamlined interpretation across cohorts. Within 2 days, the Hackathon yielded 39 candidate pathogenic SVs. The tools and workflows enabled rapid filtering, prioritization, and visualization of clinically relevant variants.

Conclusion

This community-driven effort demonstrates the feasibility and utility of scalable CNV analysis for accelerating diagnosis and discovery in rare disease cohorts using srGS data.

Article activity feed