Problems with evidence assessment in COVID-19 health policy impact evaluation: a systematic review of study design and evidence strength

Noah A Haber
Emma Clarke-Deelder
Avi Feller
Emily R Smith
Joshua A. Salomon
Benjamin MacCormack-Gelles
Elizabeth M Stone
Clara Bolster-Foucault
Jamie R Daw
Laura Anne Hatfield
Carrie E Fry
Christopher B Boyer
Eli Ben-Michael
Caroline M Joyce
Beth S Linas
Ian Schmid
Eric H Au
Sarah E Wieten
Brooke Jarrett
Cathrine Axfors
Van Thu Nguyen
Beth Ann Griffin
Alyssa Bilinski
Elizabeth A Stuart

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (ScreenIT)

Abstract

Assessing the impact of COVID-19 policy is critical for informing future policies. However, there are concerns about the overall strength of COVID-19 impact evaluation studies given the circumstances for evaluation and concerns about the publication environment.

Methods

We included studies that were primarily designed to estimate the quantitative impact of one or more implemented COVID-19 policies on direct SARS-CoV-2 and COVID-19 outcomes. After searching PubMed for peer-reviewed articles published on 26 November 2020 or earlier and screening, all studies were reviewed by three reviewers first independently and then to consensus. The review tool was based on previously developed and released review guidance for COVID-19 policy impact evaluation.

Results

After 102 articles were identified as potentially meeting inclusion criteria, we identified 36 published articles that evaluated the quantitative impact of COVID-19 policies on direct COVID-19 outcomes. Nine studies were set aside because the study design was considered inappropriate for COVID-19 policy impact evaluation (n=8 pre/post; n=1 cross-sectional), and 27 articles were given a full consensus assessment. 20/27 met criteria for graphical display of data, 5/27 for functional form, 19/27 for timing between policy implementation and impact, and only 3/27 for concurrent changes to the outcomes. Only 4/27 were rated as overall appropriate. Including the 9 studies set aside, reviewers found that only four of the 36 identified published and peer-reviewed health policy impact evaluation studies passed a set of key design checks for identifying the causal impact of policies on COVID-19 outcomes.

Discussion

The reviewed literature directly evaluating the impact of COVID-19 policies largely failed to meet key design criteria for inference of sufficient rigour to be actionable by policy-makers. More reliable evidence review is needed to both identify and produce policy-actionable evidence, alongside the recognition that actionable evidence is often unlikely to be feasible.

Version published to 10.1136/bmjopen-2021-053820
Jan 1, 2022
ScreenIT
Jan 28, 2021

SciScore for 10.1101/2021.01.21.21250243: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 2: Resources
Software and Algorithms
Sentences Resources
19 Citation counts for accepted articles were obtained through Google Scholar20 on January 11, 2021.
Google
suggested: (Google, RRID:SCR_017097)
Results from OddPub: Thank you for sharing your data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
At minimum, the flaws and limitations in their inference could have been communicated at the time of publication, when they are needed most. In other cases, it is plausible that many of these studies would not have been published had a more thorough or better targeted methodological review been performed. This systematic strength of evidence review was …

SciScore for 10.1101/2021.01.21.21250243: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 2: Resources
Software and Algorithms
Sentences Resources
19 Citation counts for accepted articles were obtained through Google Scholar20 on January 11, 2021.
Google
suggested: (Google, RRID:SCR_017097)
Results from OddPub: Thank you for sharing your data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
At minimum, the flaws and limitations in their inference could have been communicated at the time of publication, when they are needed most. In other cases, it is plausible that many of these studies would not have been published had a more thorough or better targeted methodological review been performed. This systematic strength of evidence review was not without limitations. The tool itself was limited to a very narrow - albeit critical - set of items. The studies may have made other contributions to the literature that we did not evaluate. While the guidance provided a well-structured framework and our reviewer pool was well-qualified, strength of evidence review is inherently subjective. It is plausible and likely that other sets of reviewers would come to different conclusions. Most importantly, this review does not cover all policy inference in the scientific literature. One large literature from which there may be COVID-19 policy evaluation otherwise meeting our inclusion criteria are pre-prints. Many pre-prints would likely fare well in our review process. Higher strength papers often require more time for review and publication, and many high quality papers may be in the publication pipeline at the moment. Second, this review excluded studies that had a quantitative impact evaluation as a secondary part of the study (e.g., to estimate parameters for microsimulation or disease modeling). Not only are these assessments not the primary purpose of those studies, they als...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from scite Reference Check: We found no unreliable references.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

Read the original source
ScreenIT
Jan 23, 2021

SciScore for 10.1101/2021.01.21.21250243: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected.
Table 2: Resources
No key resources detected.

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of …

SciScore for 10.1101/2021.01.21.21250243: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected.
Table 2: Resources
No key resources detected.

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
About SciScore

SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

Read the original source
Version published to 10.1101/2021.01.21.21250243 on medRxiv
Jan 22, 2021

Structured tools to assessing quality and bias in Mendelian randomisation studies: an updated systematic review

This article has 6 authors:
1. Jinyue Yu
2. Mengxuan Zou
3. Francesca Spiga
4. Sarah Dawson
5. George Davey Smith
6. Julian PT Higgins
This article has no evaluationsLatest version Sep 7, 2025
Identifying outcomes for evaluating the impact of pharmacist prescribing: A rapid overview of reviews

This article has 8 authors:
1. Ahmed Hassan Ali
2. Anais Essilini
3. Aaron Daunt
4. Michelle Flood
5. Caroline McCarthy
6. Judith Strawbridge
7. Barbara Clyne
8. Frank Moriarty
This article has no evaluationsLatest version Sep 7, 2025
PREdictors of COVID-19 OUtcomeS (PRECIOUS): Protocol for a systematic review-informed individual participant data meta-analysis of long-term outcomes after COVID-19

This article has 2 authors:
1. Myzoon Ali
2. Pauline Campbell
This article has no evaluationsLatest version Sep 8, 2025

Software and Algorithms
Sentences	Resources
19 Citation counts for accepted articles were obtained through Google Scholar20 on January 11, 2021.	Google suggested: (Google, RRID:SCR_017097)

This article has been Reviewed by the following groups

Listed in

Abstract

Methods

Results

Discussion

Article activity feed

Related articles

Structured tools to assessing quality and bias in Mendelian randomisation studies: an updated systematic review

Identifying outcomes for evaluating the impact of pharmacist prescribing: A rapid overview of reviews

PREdictors of COVID-19 OUtcomeS (PRECIOUS): Protocol for a systematic review-informed individual participant data meta-analysis of long-term outcomes after COVID-19