Study Data Element Mapping: Feasibility of Defining Common Data Elements Across COVID-19 Studies

This article has been Reviewed by the following groups

Read the full article

Abstract

Background: Numerous clinical studies are now underway investigating aspects of COVID-19. The aim of this study was to identify a selection of national and/or multicentre clinical COVID-19 studies in the United Kingdom to examine the feasibility and outcomes of documenting the most frequent data elements common across studies to rapidly inform future study design and demonstrate proof-of-concept for further subject-specific study data element mapping to improve research data management. Methods: 25 COVID-19 studies were included. For each, information regarding the specific data elements being collected was recorded. Data elements collated were arbitrarily divided into categories for ease of visualisation. Elements which were most frequently and consistently recorded across studies are presented in relation to their relative commonality. Results: Across the 25 studies, 261 data elements were recorded in total. The most frequently recorded 100 data elements were identified across all studies and are presented with relative frequencies. Categories with the largest numbers of common elements included demographics, admission criteria, medical history and investigations. Mortality and need for specific respiratory support were the most common outcome measures, but with specific studies including a range of other outcome measures. Conclusion: The findings of this study have demonstrated that it is feasible to collate specific data elements recorded across a range of studies investigating a specific clinical condition in order to identify those elements which are most common among studies. These data may be of value for those establishing new studies and to allow researchers to rapidly identify studies collecting data of potential use hence minimising duplication and increasing data re-use and interoperability

Article activity feed

  1. SciScore for 10.1101/2020.05.19.20106641: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Analysis was performed using Microsoft Excel (Microsoft Corp, Seattle).
    Microsoft Excel
    suggested: (Microsoft Excel, RRID:SCR_016137)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    The main limitations of the study are that the trials and studies included represent only a small proportion of the total registered COVID-19 studies. Nevertheless, several major large UK studies are included representing a range of COVID-19 trials, and the intention of the current work was to demonstrate the feasibility of the approach and to rapidly identify the major common data elements among studies investigating COVID-19. In addition, these present data have been derived from interpretation of study protocols and eCRF documentation where formal lists of data elements were not provided, therefore it is likely that some elements have been subjectively categorised. Ideally, strict definitions are provided for each data element with clear links to open data dictionaries, but this was rarely the case. Furthermore, the vast majority of the studies included did not provide easily accessible lists of data elements along with their definitions and terminologies, meaning that some subjective interpretation was often required in terms of the specific element usage. The majority of the studies were also associated with their own eCRF, which illustrates the issue that if many studies are underway concurrently in a similar disease area and are requiring a subset of common data elements it is highly inefficient to have multiple individual eCRF instances, most of which would usually require manual data entry of some kind. A far more efficient approach would be to collect the common dat...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.