How to mitigate selection bias in COVID-19 surveys: evidence from five national cohorts

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Non-response is a common problem, and even more so during the COVID-19 pandemic where social distancing measures challenged data collections. As non-response is often systematic, meaning that respondents are usually healthier and from a better socioeconomic background, this potentially introduces serious bias in research findings based on COVID-19 survey data. The goal of the current study was to see if we can reduce bias and restore sample representativeness despite systematic non-response in the COVID-19 surveys embedded within five UK cohort studies using the rich data available from previous time points.

Methods

A series of three surveys was conducted during the pandemic across five UK cohorts: National Survey of Health and Development (NSHD, born 1946), 1958 National Child Development Study (NCDS), 1970 British Cohort Study (BCS70), Next Steps (born 1989-90) and Millennium Cohort Study (MCS, born 2000-02). We applied non-response weights and utilised multiple imputation, making use of covariates from previous waves which have been commonly identified as predictors of non-response, to attempt to reduce bias and restore sample representativeness.

Results

Response rates in the COVID-19 surveys were lower compared to previous cohort waves, especially in the younger cohorts. We identified bias due to systematic non-response in the distributions of variables including parental social class and childhood cognitive ability. In each cohort, respondents of the COVID-19 survey had a higher percentage of parents in the most advantaged social class, and a higher mean of childhood cognitive ability, compared to the original (full) cohort sample. The application of non-response weights and multiple imputation was successful in reducing bias in parental social class and childhood cognitive ability, nearly eliminating it for the former.

Conclusions

The current paper demonstrates that it is possible to reduce bias from non-response and to a large degree restore sample representativeness in multiple waves of a COVID-19 survey embedded within long running longitudinal cohort studies through application of non-response weights or multiple imputation. Such embedded COVID-19 surveys therefore have an advantage over cross-sectional COVID-19 surveys, where non-response bias cannot be handled by leveraging previously observed information on non-respondents. Our findings suggest that, if non-response is appropriately handled, analyses based on the COVID-19 surveys within these five cohorts can contribute significantly to COVID-19 research, including studying the medium and long-term effects of the pandemic.

Article activity feed