Leveraging nationwide health care records in Estonia to identify the genetic background of understudied disease phenotypes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Nationwide health records linked to population biobanks can expand genetic discovery into clinical phenotypes that are poorly captured in hospital-centred datasets. We performed genome-wide association analyses of 5,491 ICD-10-based disease phenotypes in 206,159 Estonian Biobank participants using imputed genotype data across 18.8 million single-nucleotide and insertion-deletion variants.
Across the disease phenome, we identified 3,222 genome-wide significant loci, with strongest added value for outpatient-enriched, recurrent, and earlier-onset conditions. Fine-mapping prioritised candidate causal variants across study-wide significant loci, while coding variant analyses identified 754 protein-altering variant-trait associations outside the HLA region, including high-confidence signals in various dermatological, anaemia, congenital, and metabolic traits. A separate HLA analysis identified 744 HLA-trait associations across infectious, autoimmune, and skin-related phenotypes.
As an example of discovery in an understudied phenotype, we highlight pityriasis versicolor, a superficial fungal infection with 34 loci in EstBB-FinnGen meta-analysis, including a rare Estonian-enriched splice-disrupting TNFSF15 variant. These results establish EstBB as a resource for mapping genetic architecture across clinically diverse and underrepresented disease phenotypes.