Systematic identification of rare disease patients in electronic health records enables evaluation of clinical outcomes
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Identifying rare disease (RD) patients in electronic health records (EHR) is challenging, as more than 10,000 rare diseases are not typically captured by clinical coding systems. This limits the assessment of clinical outcomes for RD patients. This study introduces a semiautomated approach to map RDs to appropriate codes, that is applicable across various EHR systems. By improving RD patient identification, this method facilitates the analysis of clinical outcomes and disease severity in the RD population. We exemplify this by utilizing large EHR datasets such as those in the National COVID Cohort Collaborative (N3C) with over 21 million patients.
Methods
We developed a semiautomated workflow to enumerate RD-specific SNOMED-CT and ICD-10 codes, starting with 12,003 GARD IDs mapped to ORPHANET. This process linked RDs to SNOMED-CT and ICD-10 codes, applying exclusion criteria based on group of disorders. We created an extensive list of SNOMED-CT codes with descendants from the OHDSI atlas and performed phenotype filtering, removing irrelevant codes. The final list included 12,081 SNOMED-CT codes and 357 ICD-10 codes for further analysis, enabling the identification and mapping of rare diseases in EHR.
Results
Our semiautomated workflow identified 357 RD-specific ICD-10 codes and 12,081 SNOMED-CT codes representing 6,342 RDs which are categorized into 30 Orphanet linearization classes. We exemplify the utility of these codes by performing a preliminary univariate analysis of COVID-19 outcomes in a large cohort of 4,835,718 COVID-19 positive individuals in N3C, of which 404,735 (8.37%) were identified as having preexisting RD. The mortality and hospitalization risk ratios for rare RD classes ranged from 0.23 - 5.28 and 0.93 - 3.13, respectively (p-values <0.001).
Conclusions
Our systematic and automated workflow enables rapid identification of rare disease patients across diverse EHR systems. We demonstrate its utility by evaluating COVID-19 severity outcomes by rare disease classes in the N3C cohort. These findings support the need for targeted preventive healthcare interventions and highlight the potential for future research on long COVID, COVID-19 reinfection, and other outcomes in the rare disease population.