Long-read sequencing resolves the clinically relevant CYP21A2 locus, supporting a new clinical test for Congenital Adrenal Hyperplasia
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Congenital Adrenal Hyperplasia (CAH), one of the most common inherited disorders, is caused by defects in adrenal steroidogenesis. It is potentially lethal if untreated and is associated with multiple comorbidities, including fertility issues, obesity, insulin resistance, and dyslipidemia. CAH can result from variants in multiple genes, but the most frequent cause is deletions and conversions in the segmentally duplicated RCCX module, which contains the CYP21A2 gene and a pseudogene.
The molecular genetic test to identify pathogenic alleles is cumbersome, incomplete, and available from a limited number of laboratories. It requires testing parents for accurate interpretation, leading to healthcare inequity. Less severe forms are frequently misdiagnosed, and phenotype/genotype correlations incompletely understood. We explored whether emerging technologies could be leveraged to identify all pathogenic alleles of CAH, including phasing in proband-only cases. We targeted long-read sequencing outputs that would be practical in a clinical laboratory setting.
Both HiFi-based and nanopore-based whole-genome long-read sequencing datasets could be mined to accurately identify pathogenic single-nucleotide variants, full gene deletions, fusions creating non-functional hybrids between the gene and pseudogene (“30-kb deletion”), as well as count the number of RCCX modules and phase the resulting multimodular haplotypes. On the Hi-Fi data set of 6 samples, the PacBio Paraphase tool was able to distinguish nine different mono-, bi-, and tri-modular haplotypes, as well as the 30-kb and whole gene deletions. To do the same on the ONT-Nanopore dataset, we designed a tool, Parakit, which creates an enriched local pangenome to represent known haplotype assemblies and map ClinVar pathogenic variants and fusions onto them. With few labels in the region, optical genome mapping was not able to reliably resolve module counts or fusions, although designing a tool to mine the dataset specifically for this region may allow doing so in the future.
Both sequencing techniques yielded congruent results, matching clinically identified variants, and offered additional information above the clinical test, including phasing, count of RCCX modules, and status of the other module genes, all of which may be of clinical relevance. Thus long-read sequencing could be used to identify variants causing multiple forms of CAH in a single test.