Detecting pathogenic structural variation in families with undiagnosed rare disease in a national genome project
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Whole-genome sequencing (WGS) projects for rare disease diagnosis typically yield a diagnostic rate of approximately 25-40%, dependent particularly on patient selection and the extent of prior genetic testing. The Scottish Genomes Partnership (SGP) is a collaborative research programme involving four Scottish Regional Genetics Centres, four Scottish Medical Schools, and Genomics England's 100,000 Genomes Project. It aims to facilitate genome sequencing and diagnosis for patients in the Scottish NHS with suspected rare Mendelian diseases. Within SGP, short-read sequencing (SRS) achieved a diagnostic rate of 23% in affected families. Methods To increase the diagnostic yield, we applied Oxford Nanopore Technologies (ONT) long-read sequencing (LRS) to a cohort of 24 SGP families (74 individuals) who remained undiagnosed after SRS. We also re-analysed previously generated SRS data to identify pathogenic structural variants (SVs). We benchmarked several existing software tools for SV detection using LRS and defined key requirements for sample processing and DNA quality. Custom SV prioritisation and bioinformatics pipelines were developed to integrate SV discovery with genotype-phenotype analysis. Results Benchmarking showed that minimap2 + cuteSV was optimal for single-sample SV discovery, while minimap2 + Sniffles2 performed best for family-based analysis. SV calling across the cohort yielded 60,022 filtered SVs spanning autosomes and sex chromosomes. Each family had between 23,024 and 25,009 SVs genome-wide (median: 23,814). A total of 392 SVs genome-wide and 8 within a disease-gene panel were prioritised across autosomal dominant/de novo, recessive, compound heterozygous, and X-linked modes, with counts varying between families. In three exemplar families, pathogenic or likely pathogenic de novo SVs were identified in both LRS and SRS data: one at the DLX5/6 locus, one in AUTS2, and one in FN1. We provide genome-wide de novo SVs and compound heterozygous (SV + SNV) variants, and deposit raw and processed sequencing data for all families in the Genomics England Research Environment to support future gene discovery. Conclusions This study demonstrates that in-depth SV analysis can increase molecular diagnostic rates in rare disease patients with presumed monogenic aetiology. Pathogenic or likely pathogenic de novo SVs were identified in three families, resolving the diagnostic odyssey for at least two of the 24 families.