From Research to Impact: Assessing a Decade of CDC’s Public Health Science by Topic Area, 2014-2023

This article has been Reviewed by the following groups

Read the full article

Abstract

Objectives

This study provides an objective, in-depth overview of a large body of science output addressing public health. We apply topic modeling and bibliometric tools to explore the relevance and impact of a decade of CDC-authored publications.

Methods

We identified 34,104 scientific publications from 2014-2023 with ≥1 CDC-affiliated author using Science Clips, a CDC library database. We applied a large language modeling framework using BERTopic to publication titles and abstracts to identify public health topic themes. We obtained data from Altmetric, Dimensions, and BMJ Impact Analytics for these publications to bibliometric indicators. We assessed the percent with attention, academic citations, and policy citations using appropriate publication year ranges. We assessed the median Altmetric attention score, median academic citations, and the percent with policy citations for publications by topic area.

Results

Of publications from 2014-2020, 95% were cited by academic papers and 52% were cited in clinical guidance or policy. Of publications from 2014-2023, 84% garnered online attention. CDC-authored publications clustered into 46 public health topic themes. Among these, fungal infections had the highest median number of academic citations (36.5), mining safety and health had the highest proportion of papers with policy citations (92.5%), and substance abuse or opioids received the highest median public attention (Altmetric Attention Score = 14). Nearly a third of topics ranked highly (in the top 5) for at least one bibliometric indicator.

Conclusions

Publications in this collection addressed an array of public health topic themes and demonstrated resonance within academic and policy arenas as well as with the public.

SUMMARY

1)

What is the current understanding of this subject?

Public health science should address strategic priorities and translate to public health impact. CDC’s scientists publish over 3,400 articles per year on average, making it difficult to summarize the breadth of topics covered by these articles and how they affect downstream health outcomes.

2)

What does this report add to the literature?

We used cutting-edge large language modeling techniques to categorize CDC-authored publications into 46 topic themes. These topic themes span both infectious and non-infectious disease, and cover topics involved in strategic priorities like emergency response. We assess simple indicators like academic citations, policy citations, and media attention by topic area to show that all topic themes have had measurable impact. Across all topic areas, CDC-authored publications are also highly cited in policy and clinical guidance, an indicator of translation to public health impact.

3)

What are the implications for public health practice?

Public health research programs can use advances in large language modeling as well as simple indicators of reach and impact to better understand whether their publications address priority topics and translate to improving health outcomes. This overview of CDC research additionally promotes transparency about the activities of the nation’s foremost public health agency.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/16485942.

    This paper presents a bibliometric analysis of publications of the US Centers for Disease Control and Prevention (CDC) in the period 2014-2023. Being an expert in bibliometrics, not in the research fields covered by CDC, my review focuses on the bibliometric aspects of this paper, not on the substance of the research carried out by CDC.

    From a bibliometric point of view, I consider this to be a sound and rigorous study. I don't have any major comments on the paper. Below I provide a few smaller comments.

    The section about topic modeling briefly mentions 'representative publications' in each cluster. It is not clear to me how these representative publications were identified.

    The section about model quality assessment mentions 8 reviewers, but then provides a breakdown of the reviewers into categories that in total seem to include 9 individuals. Are there 8 or 9 reviewers? Also, I wonder whether the reviewers are authors of the paper. If they aren't, I wonder whether their identities could be disclosed, for instance by mentioning the reviewers in the acknowledgments section.

    In the same section, I struggle to understand the second stage of the review of the quality of the topic clusters. In particular, I am not sure how to understand the reviewers' task to "blindly assign the topic labels created in the first stage to each paper". Some clarification would be helpful.

    "notably through the Relative Citation Ratio": It is not clear to me why the authors specifically mention the Relative Citation Ratio. To my knowledge, other normalization approaches are significantly more popular than the Relative Citation Ratio. The most frequently used bibliometric analytics platforms do not include the Relative Citation Ratio, while they do include other approaches.

    Regarding the analysis of policy citations, it would be helpful if the authors could offer a bit more reflection on the policy sources that are and aren't covered by the databases used by the authors. It is important to know the extent to which the policy sources most relevant for CDC are covered or not.

    Competing interests

    The author declares that they have no competing interests.