Bias-Adjusted Predictions of County-Level Vaccination Coverage from the COVID-19 Trends and Impact Survey

This article has been Reviewed by the following groups

Read the full article

Abstract

The potential for selection bias in nonrepresentative, large-scale, low-cost survey data can limit their utility for population health measurement and public health decision making. We developed an approach to bias adjust county-level COVID-19 vaccination coverage predictions from the large-scale US COVID-19 Trends and Impact Survey.

Design

We developed a multistep regression framework to adjust for selection bias in predicted county-level vaccination coverage plateaus. Our approach included poststratification to the American Community Survey, adjusting for differences in observed covariates, and secondary normalization to an unbiased reference indicator. As a case study, we prospectively applied this framework to predict county-level long-run vaccination coverage among children ages 5 to 11 y. We evaluated our approach against an interim observed measure of 3-mo coverage for children ages 5 to 11 y and used long-term coverage estimates to monitor equity in the pace of vaccination scale up.

Results

Our predictions suggested a low ceiling on long-term national vaccination coverage (46%), detected substantial geographic heterogeneity (ranging from 11% to 91% across counties in the United States), and highlighted widespread disparities in the pace of scale up in the 3 mo following Emergency Use Authorization of COVID-19 vaccination for 5- to 11-y-olds.

Limitations

We relied on historical relationships between vaccination hesitancy and observed coverage, which may not capture rapid changes in the COVID-19 policy and epidemiologic landscape.

Conclusions

Our analysis demonstrates an approach to leverage differing strengths of multiple sources of information to produce estimates on the time scale and geographic scale necessary for proactive decision making.

Implications

Designing integrated health measurement systems that combine sources with different advantages across the spectrum of timeliness, spatial resolution, and representativeness can maximize the benefits of data collection relative to costs.

Highlights

The COVID-19 pandemic catalyzed massive survey data collection efforts that prioritized timeliness and sample size over population representativeness. The potential for selection bias in these large-scale, low-cost, nonrepresentative data has led to questions about their utility for population health measurement. We developed a multistep regression framework to bias adjust county-level vaccination coverage predictions from the largest public health survey conducted in the United States to date: the US COVID-19 Trends and Impact Survey. Our study demonstrates the value of leveraging differing strengths of multiple data sources to generate estimates on the time scale and geographic scale necessary for proactive public health decision making.

Article activity feed

  1. SciScore for 10.1101/2022.05.18.22275217: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Organisms/Strains
    SentencesResources
    The first-stage logistic regression modeled the probability of parental hesitancy as a function of fixed effects for documented gender (male, female), age group (18-24, 25-34, 35-44, 45-54, 55-64, 65+), education (high school or fewer years of education, some college or a two-year degree, four-year degree, graduate degree), and race/ethnicity (Hispanic, non-Hispanic American Indian or Alaska Native, non-Hispanic Asian, non-Hispanic Black, non-Hispanic Native Hawaiian or Other Pacific Islander, non-Hispanic White, non-Hispanic multiracial or other race), and age group of child (unknown, 12 to 17, and 5 to 11), and nested random intercepts on state and county: We did not perform a weighted regression to include the CTIS survey weights, instead adjusting for the probability of inclusion and non-response through post-stratification.
    non-Hispanic White
    suggested: None

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    The results of our study should be interpreted in the context of several limitations. First, to predict plateau coverage for children ages 5 to 11 years we assumed that the relationship between hesitancy and coverage observed for children ages 12 to 17 applies to this younger age group. Our three-month validation supports this assumption, which is necessary for prospective estimation. Second, estimates of hesitancy for children of different age groups only became available in Wave 12 of the CTIS survey, and respondents are only asked about intentions to vaccinate their oldest child. Third, we rely on historical relationships between hesitancy and observed coverage, which will not capture the evolving COVID-19 policy and epidemiologic landscape. Fourth, our analytic framework is designed to capture geographic variation in coverage but not variation by other important population characteristics such as race/ethnicity within small geographic areas. Despite these limitations, our estimates reflect a principled approach to generating bias-adjusted estimates of vaccination coverage that can be used to inform decisions and evaluate actual progress against a reference scenario.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.