Registry Forge: an open-source end-to-end pipeline for patient-directed SMART on FHIR registries
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objectives
Patient-directed SMART on FHIR lets registries acquire longitudinal electronic health record data, but the payload requires substantial engineering before use. We present Registry Forge, an open-source pipeline that converts it into research-ready outputs.
Materials and Methods
Registry Forge decodes and parses mixed C-CDA, HTML, RTF, PDF, and FHIR inputs, joins records to a canonical patient identifier, and emits a browser-viewable dashboard, an OMOP CDM v5.4 data set, GA4GH Phenopackets v2, a code inventory, and regex extractions of disease-specific narrative content.
Results
Applied to the ALS Research Collaborative Study (94 participants, 56 US health systems), it processed 22,686 source files and 1,791 FHIR Bundles (109,599 resources); only 15.0% of files were full C-CDA.
Discussion
This pipeline generalizes to any registry acquiring data through patient-directed SMART on FHIR.
Conclusion
Registry Forge closes the acquisition-to-analysis gap with no server infrastructure and is openly available.
LAY SUMMARY
Patient registries and natural history studies (referred to as ‘registries’ in this article) are research databases that collect data on people with a particular medical condition, often over time. To work well, they need detailed medical records, either from the participant answering surveys or through information from each participant’s doctors. A federal rule lets participants log in to their hospital’s patient portal and grant a registry access to their own electronic health record. This solves the data collection problem but creates a new one: the downloaded files arrive in many formats and have to be cleaned, combined, and standardized before researchers can use them. Small registries, especially those for rare diseases, often cannot afford the engineers required to do this work.
We built Registry Forge, a free open-source software tool that does this work automatically. We used it for the ALS Research Collaborative Study, a long-running natural history study of amyotrophic lateral sclerosis run by the ALS Therapy Development Institute. Registry Forge processed records from 94 study participants whose care spanned more than 50 US health systems and turned them into research-ready data sets. The tool runs on a standard laptop and is free for anyone to use.