Establishing and characterising large COVID-19 cohorts after mapping the Information System for Research in Primary Care in Catalonia to the OMOP Common Data Model
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Background
Few datasets have been established that capture the full breadth of COVID-19 patient interactions with a health system. Our first objective was to create a COVID-19 dataset that linked primary care data to COVID-19 testing, hospitalisation, and mortality data at a patient level. Our second objective was to provide a descriptive analysis of COVID-19 outcomes among the general population and describe the characteristics of the affected individuals.
Methods
We mapped patient-level data from Catalonia, Spain, to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). More than 3,000 data quality checks were performed to assess the readiness of the database for research. Subsequently, to summarise the COVID-19 population captured, we established a general population cohort as of the 1 st March 2020 and identified outpatient COVID-19 diagnoses or positive test results for SARS-CoV-2, hospitalisations with COVID-19, and COVID-19 deaths during follow-up, which went up until 30 th June 2021.
Findings
Mapping data to the OMOP CDM was performed and high data quality was observed. The mapped database was used to identify a total of 5,870,274 individuals, who were included in the general population cohort as of 1 st March 2020. Over follow up, 604,472 had either an outpatient COVID-19 diagnosis or positive test result, 58,991 had a hospitalisation with COVID-19, 5,642 had an ICU admission with COVID-19, and 11,233 had a COVID-19 death. People who were hospitalised or died were more commonly older, male, and with more comorbidities. Those admitted to ICU with COVID-19 were generally younger and more often male than those hospitalised in general and those who died.
Interpretation
We have established a comprehensive dataset that captures COVID-19 diagnoses, test results, hospitalisations, and deaths in Catalonia, Spain. Extensive data checks have shown the data to be fit for use. From this dataset, a general population cohort of 5.9 million individuals was identified and their COVID-19 outcomes over time were described.
Funding
Generalitat de Catalunya and European Health Data and Evidence Network (EHDEN).
Article activity feed
-
SciScore for 10.1101/2021.11.23.21266734: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected. Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
Software and Algorithms Sentences Resources SNOMED, for example, is a standard vocabulary for conditions, while RxNorm codes are a standard vocabulary for drug exposures. RxNormsuggested: (RxNorm, RRID:SCR_006645)Lastly, hospitalisation data, from the conjunt mínim bàsic de dades de l’alta hospitalària (minimum basic set of hospital discharge data) collated by the Data Analysis Program for Health Research and Innovation (PADRIS) in Catalonia, was also linked at the individual-level. Data Analysis Programsuggested: NoneResults from OddPub: Thank you …
SciScore for 10.1101/2021.11.23.21266734: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected. Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
Software and Algorithms Sentences Resources SNOMED, for example, is a standard vocabulary for conditions, while RxNorm codes are a standard vocabulary for drug exposures. RxNormsuggested: (RxNorm, RRID:SCR_006645)Lastly, hospitalisation data, from the conjunt mínim bàsic de dades de l’alta hospitalària (minimum basic set of hospital discharge data) collated by the Data Analysis Program for Health Research and Innovation (PADRIS) in Catalonia, was also linked at the individual-level. Data Analysis Programsuggested: NoneResults from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Strengths and limitations: Much of the COVID-19 literature is based on studies where study populations have been drawn from people hospitalised with COVID-19, tested for infection, or who volunteered to participate in a study. Such studies can be subject to a number of biases, in particular collider bias which can lead to the reporting associations that do not exist for the general population or by attenuating, inflating or reversing the sign of true associations.30 This underscores the importance of developing comprehensive datasets to generate the reliable evidence required to inform decision-making related to the pandemic. With more than half a million outpatient cases of COVID-19 captured and a breadth of data capture that allows for comparisons with the general population and subsequent hospital care to be described, the mapped SIDIAP database described here is one such resource. While electronic health record data brings numerous opportunities, with the data collected for non-research purposes careful curation is required. Using a well-established common data model, meant that existing open-source tools could be used to evaluate data quality and that research studies can be run in a distributed manner. This has allowed the database to already have been used in a number of international network research studies, with standardised analytic packages and only aggregated results sets shared. One limitation of the dataset has been seen with the likely underreporting of COVID-...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-