Exploring selection bias in COVID-19 research: Simulations and prospective analyses of two UK cohort studies

Abstract

Background

Non-random selection into analytic subsamples could introduce selection bias in observational studies of SARS-CoV-2 infection and COVID-19 severity (e.g. including only those have had a COVID-19 PCR test). We explored the potential presence and impact of selection in such studies using data from self-report questionnaires and national registries.

Methods

Using pre-pandemic data from the Avon Longitudinal Study of Parents and Children (ALSPAC) (mean age=27.6 (standard deviation [SD]=0.5); 49% female) and UK Biobank (UKB) (mean age=56 (SD=8.1); 55% female) with data on SARS-CoV-2 infection and death-with-COVID-19 (UKB only), we investigated predictors of selection into COVID-19 analytic subsamples. We then conducted empirical analyses and simulations to explore the potential presence, direction, and magnitude of bias due to selection when estimating the association of body mass index (BMI) with SARS-CoV-2 infection and death-with-COVID-19.

Results

In both ALSPAC and UKB a broad range of characteristics related to selection, sometimes in opposite directions. For example, more educated participants were more likely to have data on SARS-CoV-2 infection in ALSPAC, but less likely in UKB. We found bias in many simulated scenarios. For example, in one scenario based on UKB, we observed an expected odds ratio of 2.56 compared to a simulated true odds ratio of 3, per standard deviation higher BMI.

Conclusion

Analyses using COVID-19 self-reported or national registry data may be biased due to selection. The magnitude and direction of this bias depends on the outcome definition, the true effect of the risk factor, and the assumed selection mechanism.

Key messages

Observational studies assessing the association of risk factors with SARS-CoV-2 infection and COVID-19 severity may be biased due to non-random selection into the analytic sample.
Researchers should carefully consider the extent that their results may be biased due to selection, and conduct sensitivity analyses and simulations to explore the robustness of their results. We provide code for these analyses that is applicable beyond COVID-19 research.

Article activity feed

SciScore for 10.1101/2021.12.10.21267363: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Ethics	not detected.
Sex as a biological variable	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.

Table 2: Resources

No key resources detected.

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

Strengths and limitations: We used both empirical analyses and simulations to comprehensively investigate the potential presence and impact of selection bias in COVID-19 studies. We used two cohorts with pre-pandemic data allowing us to identify potential determinants of selection. We were able to compare across these cohorts that have contrasting sources of COVID-19 data (from questionnaires in ALSPAC and national registries in UKB). In addition, a strength of our simulations is that we based most of the parameters on either cohort data or other secondary sources to try to reflect realistic scenarios. In the analyses presented here we make several assumptions about or simplifications of the data. Both ALSPAC and UKB are subject to pre-pandemic selection bias due to non-random recruitment into these studies and loss to follow-up, which we do not account for here. Overall, we considered misclassification of the comparison groups (e.g. infected as non-infected) but not of the case groups (e.g. non-infected as infected). This may be particularly problematic for self-reported COVID-19 data and cause of death attributed to COVID-19 early in the pandemic [23]. We have focussed analyses here on the first wave of the COVID-19 pandemic in the UK. Selection bias may change over time as the pandemic progresses, which may explain some of the differences between ALSPAC and UKB. In ALSPAC, the comparison of SARS-CoV-2 (+) with everyone else, including participants who did not reply to the ...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
No funding statement was detected.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Exploring selection bias in COVID-19 research: Simulations and prospective analyses of two UK cohort studies

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusion

Key messages

Article activity feed

Methodological Analysis of Bias Risks in Adaptive Multi-Arm Platform Trials: A Case-Series from Three COVID-19 Studies

Missing Data in OHCA Registries: How Multiple Imputation Methods Affect Research Conclusions—Paper II

The Bangladesh Healthcare Worker Cohort – Assessing the Longitudinal Impact of COVID-19 on Occupational and Psychological Health: Cohort Profile

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusion

Key messages

Article activity feed

Related articles

Methodological Analysis of Bias Risks in Adaptive Multi-Arm Platform Trials: A Case-Series from Three COVID-19 Studies

Missing Data in OHCA Registries: How Multiple Imputation Methods Affect Research Conclusions—Paper II

The Bangladesh Healthcare Worker Cohort – Assessing the Longitudinal Impact of COVID-19 on Occupational and Psychological Health: Cohort Profile