An experimental test of the effects of redacting grant applicant identifiers on peer review outcomes

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This manuscript by Nakamura et al provides additional information on the impact of bias in the scientific review process. The authors find that on average applications from White scientists scored better than those from Black scientists. However, blinding reviewers to race worsened the score of the White scientists with no impact on the score for Black scientists. In view of the recognized value of increased diversity in science, the additional information provided in this manuscript adds new data to this discussion. Potential solutions to this bias though are complex.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Blinding reviewers to applicant identity has been proposed to reduce bias in peer review.

Methods:

This experimental test used 1200 NIH grant applications, 400 from Black investigators, 400 matched applications from White investigators, and 400 randomly selected applications from White investigators. Applications were reviewed by mail in standard and redacted formats.

Results:

Redaction reduced, but did not eliminate, reviewers’ ability to correctly guess features of identity. The primary, preregistered analysis hypothesized a differential effect of redaction according to investigator race in the matched applications. A set of secondary analyses (not preregistered) used the randomly selected applications from White scientists and tested the same interaction. Both analyses revealed similar effects: Standard format applications from White investigators scored better than those from Black investigators. Redaction cut the size of the difference by about half (e.g. from a Cohen’s d of 0.20–0.10 in matched applications); redaction caused applications from White scientists to score worse but had no effect on scores for Black applications.

Conclusions:

Grant-writing considerations and halo effects are discussed as competing explanations for this pattern. The findings support further evaluation of peer review models that diminish the influence of applicant identity.

Funding:

Funding was provided by the NIH.

Article activity feed

  1. Author Response:

    Reviewer #2 (Public Review):

    This manuscript details an investigation into whether blinding NIH grant reviewers to the name and institution may affect their review scores. They demonstrate that unblinded grants lead to slightly higher scores for white applicants than blacks, however, a deeper dive demonstrates that grantsmanship and history of prior funding can be even greater predictors of scores regardless of race.

    Overall the manuscript touches on a presently vogue topic and that is of equality in outcomes and systemic racism. The major limitations of the study however, are ironically demonstrating the very topic that the manuscript tries to address. There are no considerations in the manuscript or mention of applications from Asian, Hispanic or Native American applicants, as the authors distill the problem literally down to only Black and White.

    We now incorporate more of this perspective.

    We rewrote the introduction, adding information about funding rates for Hispanic and Asian PIs (the 2 largest groups of minority applicants), and provided a stronger explanation for why this study focused on Black-White differences only (lines 86-95) Our aim was to provide a broader context while keeping the intro reasonably focused. Demographic differences in patterns of application numbers, review outcomes, and funding success is a complex topic, not easily presented concisely. More importantly, we think that this information, while no doubt of interest to some, is not relevant background to the experiment at hand. We tried to strike a balance between context and focus.

  2. Evaluation Summary:

    This manuscript by Nakamura et al provides additional information on the impact of bias in the scientific review process. The authors find that on average applications from White scientists scored better than those from Black scientists. However, blinding reviewers to race worsened the score of the White scientists with no impact on the score for Black scientists. In view of the recognized value of increased diversity in science, the additional information provided in this manuscript adds new data to this discussion. Potential solutions to this bias though are complex.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    The manuscript by Nakamura et al builds on previous publications reporting that research grants from Black Investigators scored worse and were less successful than those from White investigators. The authors of the current study attempt to examine the role of several variables (PI race, gender,degree, career stage, etc) on scoring of scientific reviews. The authors examined whether blinding reviewers to PI identity impacted review scores. They examined 1,200 applications, broken down to 400 from Black investigators, 400 from matched White investigators and 400 from un matched/random White applicants. The authors find that applications from White applicants scored better than those from Black applicants. The authors redacted information in the application and found that this worsened scores for the White applicants but had no impact on Black applicants. The study had several strengths including the fact that it used an experienced group of NIH grant reviewers and in fact the first author Dr. Nakamura was previously head of the CSR. Another strength is that the grant applications were actual grants submitted to NIH between 2014-2015. The study has several inherent weaknesses including that it is very hard to completely blind the reviewers to variables such as institution, career stage, PI identity, race etc. Redaction is also very time consuming and may affect the overall impact of the submitted application. Nevertheless the results provide new information to help better define the issues surrounding reviewer bias and potentially help in the search for possible solutions.

  4. Reviewer #2 (Public Review):

    This manuscript details an investigation into whether blinding NIH grant reviewers to the name and institution may affect their review scores. They demonstrate that unblinded grants lead to slightly higher scores for white applicants than blacks, however, a deeper dive demonstrates that grantsmanship and history of prior funding can be even greater predictors of scores regardless of race.

    Overall the manuscript touches on a presently vogue topic and that is of equality in outcomes and systemic racism. The major limitations of the study however, are ironically demonstrating the very topic that the manuscript tries to address. There are no considerations in the manuscript or mention of applications from Asian, Hispanic or Native American applicants, as the authors distill the problem literally down to only Black and White.

  5. Reviewer #3 (Public Review):

    This is a study of peer review merit evaluation for a set of R01-mechanism grant applications that were originally reviewed in the National Institutes of Health's Center for Scientific Review in 2014 and 2015. The goal was to try to determine the effects of redacting the identity of the scientists serving as Principal Investigator (PI) on these applications on the merit scoring by experienced peer reviewers. The explicit focus of this study was on whether differential effects of redacting identity were obtained for applications with Black versus white PIs. This study fits into a broader perspective in which the NIH has identified a consistent, very large, reduction in the rate of funding applications with Black PIs. This study represents an attempt to determine if explicit knowledge of the race of PIs affects peer review scoring of the applications. The major outcome of the study was that redaction of the PI identity reduced the Black/white gap by about half; this was due to a reduction in the scores of identity-redacted applications with white PIs, relative to their scores with identity provided.

    Major strengths of this work include pre-registration of the design and analytical plan for the study, the inclusion of two control data sets with white PIs (one matched to the primary experimental group, one randomly selected), selection of reviewers from a pool with experience on the study sections in which the target applications had been reviewed, and assessment of the success of reviewer blinding with post-review surveying.

    There are several methodological issues which limit confidence in translating these experimental results to actual NIH review processes. Of the nine scientific review officers who were used to recruit and assign reviewers, only four had any prior experience in real review panels. The integral role that reviewer recruitment and application assignment has on scoring was thus only poorly reproduced for this study. In a related vein, the reviewers provided written critiques but did not meet in study section panels, this eliminates a substantial part of the determination of an application's voted scoring. Individual reviewers also completed a wide range of reviews (1-29) which also has major implications for how a given application is reviewed in real panels with the reviewer "load" much more consistent. These concerns are supported by a perplexing finding of an advantage for the applications with white PIs in the experimental standard-identity review condition, given that this group was matched for original scores with the applications with Black PIs.

    The study at present omits critical analysis / data depiction which would be of tremendous value. First, the correlations of original review scores with those obtained in the standard-format experimental condition have not been presented. This is key not just for the above mentioned discrepancy for the score-matched group of applications, but as a more general issue of replication/repeatability for the NIH. Second, there is no direct presentation of the differences in the experimental scoring for the applications in redacted and standard guise.

    Overall, the study presents a highly credible attempt to investigate an issue of significance and impact for NIH and NIH-funded scientists. While there are significant caveats, the limitations are mostly transparent. The main conclusions are supported. This adds evidence to the obvious conclusion that the substantial bias against the funding of NIH proposals with Black PIs is a "death of a thousand cuts" scenario where no one single major cause can be found. Importantly, it shows that changing to blinded review is likely only to have very limited impact and it will do so mostly by reducing the advantage for applications with white PIs that results from their identity being known.