Exploring selection bias in COVID-19 research: Simulations and prospective analyses of two UK cohort studies
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Background
Non-random selection into analytic subsamples could introduce selection bias in observational studies of SARS-CoV-2 infection and COVID-19 severity (e.g. including only those have had a COVID-19 PCR test). We explored the potential presence and impact of selection in such studies using data from self-report questionnaires and national registries.
Methods
Using pre-pandemic data from the Avon Longitudinal Study of Parents and Children (ALSPAC) (mean age=27.6 (standard deviation [SD]=0.5); 49% female) and UK Biobank (UKB) (mean age=56 (SD=8.1); 55% female) with data on SARS-CoV-2 infection and death-with-COVID-19 (UKB only), we investigated predictors of selection into COVID-19 analytic subsamples. We then conducted empirical analyses and simulations to explore the potential presence, direction, and magnitude of bias due to selection when estimating the association of body mass index (BMI) with SARS-CoV-2 infection and death-with-COVID-19.
Results
In both ALSPAC and UKB a broad range of characteristics related to selection, sometimes in opposite directions. For example, more educated participants were more likely to have data on SARS-CoV-2 infection in ALSPAC, but less likely in UKB. We found bias in many simulated scenarios. For example, in one scenario based on UKB, we observed an expected odds ratio of 2.56 compared to a simulated true odds ratio of 3, per standard deviation higher BMI.
Conclusion
Analyses using COVID-19 self-reported or national registry data may be biased due to selection. The magnitude and direction of this bias depends on the outcome definition, the true effect of the risk factor, and the assumed selection mechanism.
Key messages
-
Observational studies assessing the association of risk factors with SARS-CoV-2 infection and COVID-19 severity may be biased due to non-random selection into the analytic sample.
-
Researchers should carefully consider the extent that their results may be biased due to selection, and conduct sensitivity analyses and simulations to explore the robustness of their results. We provide code for these analyses that is applicable beyond COVID-19 research.
Article activity feed
-
SciScore for 10.1101/2021.12.10.21267363: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected. Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Strengths and limitations: We used both empirical analyses and simulations to comprehensively investigate the potential presence and impact of selection bias in COVID-19 studies. We used two cohorts with pre-pandemic data allowing us to identify potential determinants of selection. We were able to compare across these cohorts that have contrasting sources of …
SciScore for 10.1101/2021.12.10.21267363: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected. Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Strengths and limitations: We used both empirical analyses and simulations to comprehensively investigate the potential presence and impact of selection bias in COVID-19 studies. We used two cohorts with pre-pandemic data allowing us to identify potential determinants of selection. We were able to compare across these cohorts that have contrasting sources of COVID-19 data (from questionnaires in ALSPAC and national registries in UKB). In addition, a strength of our simulations is that we based most of the parameters on either cohort data or other secondary sources to try to reflect realistic scenarios. In the analyses presented here we make several assumptions about or simplifications of the data. Both ALSPAC and UKB are subject to pre-pandemic selection bias due to non-random recruitment into these studies and loss to follow-up, which we do not account for here. Overall, we considered misclassification of the comparison groups (e.g. infected as non-infected) but not of the case groups (e.g. non-infected as infected). This may be particularly problematic for self-reported COVID-19 data and cause of death attributed to COVID-19 early in the pandemic [23]. We have focussed analyses here on the first wave of the COVID-19 pandemic in the UK. Selection bias may change over time as the pandemic progresses, which may explain some of the differences between ALSPAC and UKB. In ALSPAC, the comparison of SARS-CoV-2 (+) with everyone else, including participants who did not reply to the ...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- No funding statement was detected.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-