Reliability of COVID-19 data: An evaluation and reflection

Abstract

The rapid proliferation of COVID-19 has left governments scrambling, and several data aggregators are now assisting in the reporting of county cases and deaths. The different variables affecting reporting (e.g., time delays in reporting) necessitates a well-documented reliability study examining the data methods and discussion of possible causes of differences between aggregators.

Objective

To statistically evaluate the reliability of COVID-19 data across aggregators using case fatality rate (CFR) estimates and reliability statistics.

Design, setting, and participants

Cases and deaths were collected daily by volunteers via state and local health departments, as primary sources and newspaper reports, as secondary sources. In an effort to begin comparison for reliability statistical analysis, BroadStreet collected data from other COVID-19 aggregator sources, including USAFacts, Johns Hopkins University, New York Times, The COVID Tracking Project.

Main outcomes and measures

COVID-19 cases and death counts at the county and state levels.

Results

Lower levels of inter-rater agreement were observed across aggregators associated with the number of deaths, which manifested itself in state level Bayesian estimates of COVID-19 fatality rates.

Conclusions and relevance

A national, publicly available data set is needed for current and future disease outbreaks and improved reliability in reporting.

SciScore for 10.1101/2021.04.25.21256069: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Ethics	not detected.
Sex as a biological variable	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.

Table 2: Resources

No key resources detected.

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

Limitations: While the intent of this study was not to explain aggregator differences, a short discussion of some differences are noteworthy. For example, Broadstreet often showed significant disagreement with other aggregators in several states where they update historic case and death totals to reflect date of symptom onset, date of diagnosis, or date of death. Consequently, as a result of challenges posed in the many stages of data analyses, the reliability and validity of these statistics is critical when creating policies to protect the public and accurately modeling the disease. Disease data validity is imperative and should be the primary objective for any institute, as without validity there can be no reliability. Given that validity cannot be assessed without significant agency and/or government oversight, this study sought to evaluate COVID-19 data reliability, providing insight into the consistency of data across different sources. Fundamentally, the validity of any statistical analysis is based on the quality of data collected (39,41–46). Moreover, it is critical that aggregators are transparent in their data collection process so users can judge the validity of their process and can understand discrepancies in numbers across data collection sources. An important caveat is the validity of the final data source is largely dependent on the initial sources providing the data (e.g., state officials and hospitals). For this reason, it is critical that mechanisms are al...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Reliability of COVID-19 data: An evaluation and reflection

This article has been Reviewed by the following groups

Listed in

Abstract

Objective

Design, setting, and participants

Main outcomes and measures

Results

Conclusions and relevance

Article activity feed

From disparate lists to population estimates: A multiple systems estimation workflow for mortality analysis in conflict settings

Analyzing COVID-19 Deaths by Weekday: A Four-Year Statistical Review (2020–2023)

A Statewise Analysis of the Socioeconomic and Health Impacts of the COVID-19 Pandemic in India: Lessons for Future Health System Preparedness

This article has been Reviewed by the following groups

Listed in

Abstract

Objective

Design, setting, and participants

Main outcomes and measures

Results

Conclusions and relevance

Article activity feed

Related articles

From disparate lists to population estimates: A multiple systems estimation workflow for mortality analysis in conflict settings

Analyzing COVID-19 Deaths by Weekday: A Four-Year Statistical Review (2020–2023)

A Statewise Analysis of the Socioeconomic and Health Impacts of the COVID-19 Pandemic in India: Lessons for Future Health System Preparedness