Automated production of research data marts from a canonical fast healthcare interoperability resource data repository: applications to COVID-19 research

This article has been Reviewed by the following groups

Read the full article

Abstract

Objective

The rapidly evolving COVID-19 pandemic has created a need for timely data from the healthcare systems for research. To meet this need, several large new data consortia have been developed that require frequent updating and sharing of electronic health record (EHR) data in different common data models (CDMs) to create multi-institutional databases for research. Traditionally, each CDM has had a custom pipeline for extract, transform, and load operations for production and incremental updates of data feeds to the networks from raw EHR data. However, the demands of COVID-19 research for timely data are far higher, and the requirements for updating faster than previous collaborative research using national data networks have increased. New approaches need to be developed to address these demands.

Methods

In this article, we describe the use of the Fast Healthcare Interoperability Resource (FHIR) data model as a canonical data model and the automated transformation of clinical data to the Patient-Centered Outcomes Research Network (PCORnet) and Observational Medical Outcomes Partnership (OMOP) CDMs for data sharing and research collaboration on COVID-19.

Results

FHIR data resources could be transformed to operational PCORnet and OMOP CDMs with minimal production delays through a combination of real-time and postprocessing steps, leveraging the FHIR data subscription feature.

Conclusions

The approach leverages evolving standards for the availability of EHR data developed to facilitate data exchange under the 21st Century Cures Act and could greatly enhance the availability of standardized datasets for research.

Article activity feed

  1. SciScore for 10.1101/2021.03.11.21253384: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Limitations: The above approach is standards-based but leverages proprietary extensions of the FHIR subscription specification. As discussed above, there are inherent limitations in the speed with which clinical data can be integrated into any analytic model such as OMOP or PCORnet. For example, orders or laboratory data cannot be linked to an encounter that does not (yet) exist. The use of the persistence module for the transformation of data is novel and computationally efficient but implements rules in compiled Java code, where changes may be more difficult. Ultimately, some manual ETL was still deemed optimal in the production pipeline; however, future work may reduce these requirements. Summary: The use of FHIR standard as a canonical representation of clinical data with the subsequent dynamic transformation to other research CDMs for analytics is a practical approach to accelerate the availability of data for research and may be particularly useful for evolving diseases such as COVID-19. While it is theoretically possible to fully automate transformation to near real-time versions of OMOP or PCORnet databases, it is more practical given the evolving nature of data to take a staged approach for models for longitudinal data analysis applications.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.