Analysis of science journalism reveals gender and regional disparities in coverage

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This important bibliometric analysis shows that authors of scientific papers whose names suggest they are female or East Asian get quoted less often in news stories about their work. While caveats are inevitable in this type of study, the evidence for the authors' claims is convincing, with a rigorous, and importantly, reproducible analysis of over 20,000 articles from across 15 years. This paper should be of interest to all scientists and science journalists, as well as to those who study science communication.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Science journalism is a critical way for the public to learn about and benefit from scientific findings. Such journalism shapes the public’s view of the current state of science and legitimizes experts. Journalists can only cite and quote a limited number of sources, who they may discover in their research, including recommendations by other scientists. Biases in either process may influence who is identified and ultimately included as a source. To examine potential biases in science journalism, we analyzed 22,001 non-research articles published by Nature and compared these with Nature-published research articles with respect to predicted gender and name origin. We extracted cited authors’ names and those of quoted speakers. While citations and quotations within a piece do not reflect the entire information-gathering process, they can provide insight into the demographics of visible sources. We then predicted gender and name origin of the cited authors and speakers. We compared articles with a comparator set made up of first and last authors within primary research articles in Nature and a subset of Springer Nature articles in the same time period. In our analysis, we found a skew toward quoting men in Nature science journalism. However, quotation is trending toward equal representation at a faster rate than authorship rates in academic publishing. Gender disparity in Nature quotes was dependent on the article type. We found a significant over-representation of names with predicted Celtic/English origin and under-representation of names with a predicted East Asian origin in both in extracted quotes and journal citations but dampened in citations.

Article activity feed

  1. eLife assessment

    This important bibliometric analysis shows that authors of scientific papers whose names suggest they are female or East Asian get quoted less often in news stories about their work. While caveats are inevitable in this type of study, the evidence for the authors' claims is convincing, with a rigorous, and importantly, reproducible analysis of over 20,000 articles from across 15 years. This paper should be of interest to all scientists and science journalists, as well as to those who study science communication.

  2. Reviewer #1 (Public Review):

    This manuscript studies the representation by gender and name origin of authors from Nature and Springer Nature articles in Nature News. The representation of author identities is an important step towards equality in science, and the authors found that women are underrepresented in news quotes and mentions with respect to the proportion of women authors.

    Strengths:

    The research is rigorously conducted. It presents relevant questions and compelling answers. The documentation of the data and methods is thoroughly done, and the authors provide the code and data for reproduction.

    Weaknesses:

    The article is not so clearly structured, which makes it hard to follow. A better framing, contextualization, and conceptualization of their analysis would help the readers to better understand the results. There are some unclear definitions and wrong wording of key concepts.

  3. Reviewer #2 (Public Review):

    This paper set out to investigate disparities in how authors of scientific papers are quoted in the context of science journalism. Quotations, the authors argue, reveal who a science journalist approaches as a source and thus who is considered an expert. At the same time, quotation in the news legitimizes experts and signals the importance of their perspective and opinions. It is therefore important to identify disparities in a quotation, both as a matter of justice and to ensure the representation of diverse viewpoints in journalism.

    Here, the authors investigate disparities in quotation based on the gender and national origin of experts. They focus on science journalism in non-research articles published in the journal Nature. Articles are scraped from the Nature website and using established NLP tools the article content is parsed for quotations and the names of scientists being quoted. The gender and national origin of scientists are inferred based on their names and gendered pronouns used in the text. The rates of quotation based on gender/national origin are then compared to the demographics of authors (also inferred) of research articles published in Nature; this establishes a baseline to compare who is quoted vs. who is actually doing research. Based on these data, a variety of analyses are presented showing various aspects of bias and disparity in who is quoted in science journalism.

    From their analysis, the authors make the following claims:

    • Authors inferred as men were over-represented in quotations in journalistic Nature articles relative to their share of first and last authors in Nature.

    • A quotation is sharply trending towards gender parity, with variation by the type of article.

    • Authors with names inferred as originating from Celtic/English regions were over-represented, whereas authors with names inferred as originating from East Asia were heavily under-represented in quotations.

    • The representation of authors with inferred East Asian names has increased faster among the last authors of research articles in Nature than it has in a journalistic quotation.

    Claims 2-4 are solidly supported by the evidence presented in the manuscript. Claim 1 is supported by the evidence, but with some caveats. Support for Claim 1 depends on whether Nature's first or last authors are the most appropriate comparison set; if the last authors are the most appropriate, then Claim 1 only holds for 2005 through 2010. I expand on this point below.

    I praise the manuscript and the authors for their commitment to reproducibility. Supplied with the paper is all the data (where possible) and code necessary to reproduce the results, as well as a Docker image that ensures that it can be re-executed far into the future.

    The analyses conducted are methodologically rigorous. The authors provide bootstrapped confidence intervals for all analyzed values, choose appropriate baselines, and validate their name inference approach. In addition, I found their analysis comprehensive. By this I mean that they sufficiently explored their data to support their claims; nearly every caveat or limitation I could think of while reading was appropriately addressed either in the main or in a supplemental figure or table.

    While a good paper, it is not without weaknesses. The paper is generally well-written, and the visualizations do a good job of communicating results. There is, of course, room to improve on both. In some cases, the manuscript lacks consistency in terminology, and uses word choice that is strange (e.g., "enrichment" and "depletion" when discussion representation). While this paper is methodologically rigorous and professional in its presentation, I feel that the authors could have done a better job of interpreting and contextualizing their findings. Specifically, readers should be aware of the caveats regarding Claim 1 (listed above), the limits of generalizing these findings to other areas of science journalism, and a somewhat shallow discussion section that I believe detracts from the study's significance. I outline these points in more detail below.

    Despite these quibbles, the authors find solid support for their claims and achieve their goals. This paper, I believe will be of general interest to scientists and science communicators, to those interested in science communication as a field, to meta-scientists, and to those aiming to improve diversity and equity in the scientific process.

    Caveats to Claim Claim 1:

    One of the claims made by the authors (Claim 1) is that quotations in the dataset skew towards men. I find this true, but with two related caveats: that it depends on the choice of comparator set, and that it changes over time.

    The authors assess the representation of quotation by comparison to either Nature's first authors, or last authors. However, the authors do not discuss whether one is more appropriate, and what is implied if, say, quotations match the last author but not the first authors. In most scientific fields, the last author corresponds to the conceptual lead of a paper and is often the corresponding author who is most likely to be contacted to discuss the paper's significance. First authors, in contrast, will often represent the "driver" of the project-basically the person doing most of the actual work and is usually a student or more junior researcher. This distinction is important because cases could be made for either being a more appropriate comparator - last authors due to their seniority, first authors due to their closeness to the study, and (typically) greater diversity.

    The choice of comparator set becomes an issue because, as per Claim 2, the representation of women is increasing over time. Claim 1 only holds for the last authors from 2005 through 2010, and after 2018 women have higher representation given the demographics of the last authors. For the first authors, Claim 1 holds through 2017, after which they are representative or slightly over-representative of women authors.

    So while Claim 1 holds, it does not hold for all comparator sets and for all years. I don't think this is critical of the paper-the authors do discuss the trend in Claim 2-but interpretation of this claim should take care of these caveats, and readers should consider the important differences in first and last authorship.

    Generalizability to other contexts of science journalism:

    Journalistic articles in Nature may not be representative of all contexts of science journalism. Nature has a unique readership, consisting of scientists from many disciplines who have not only a generalist interest in science but also an interest in aspects of science as a profession. Science journalism as a whole, however, is part of the broader landscape of mainstream media, consisting of outlets such as ABC, BBC, and Scientific American. The audiences for these outlets will be more general, less interested in science as a career, and will likely have a different appetite for direct quotations and for more technical topics.

    This does not make the study bad. On the contrary, the author's focus on Nature allowed for many interesting analyses-but their findings should still be understood as coming from a specific context. While the authors outline many limitations of their study, they do not grapple with the limits of its generalizability, and what aspects of their analysis might translate to other contexts of science journalism. For example, part of the trend towards gender parity in a quotation is explained by the higher representation of women in the "Career Feature" article type. However, this article type will likely not be present in more general-interest contexts, which would affect the representation of women.

    Shallow discussion:

    I feel that the authors missed an opportunity to use their discussion to not only properly contextualize their results, but also explore their significance. In broad terms, there is literature on science journalism, its consequences for science, and the impact on public perceptions, as well as a continuous meta-discourse on journalistic ethics and best practices. The authors pay lip service to some of these themes but do little to actually place their findings in the broader discourse. Below, I provide a few specific points that could be further discussed:

    What might be the downstream impacts on the public stemming from the under-representation of scientists with East Asian names?

    The authors highlight gender parity in career features, but why exactly is there gender parity in this format of Representation in quotations varies by first and last author, most certainly as a result of the academic division of labor in the life sciences. However, what does it say about the scientific quotation that it appears first authors are more often to be quoted? Does this mean that the division of labor is changing such that the first authors are the lead scientists? Or does it imply that senior authors are being skipped over, or giving away their chance to comment on a study to the first author?

    Moreover, there are several findings in the study which are notable but don't seem to have been mentioned at all in the discussion.

    Below I highlight a few:

    • According to Figure 3d, not only are East Asian names under-represented in quotations, but they are becoming more under-represented over time as they appear as authors in a greater number of Nature publications.

    • Those with European names are proportionately represented in quotations given their share of authors in Nature. Why might this be, especially seeing as Anglo names are heavily over-represented?