Simulated sample splitting approach to address biases due to instrument selection and participant overlap in two-sample Mendelian Randomization studies

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Mendelian randomization (MR) is a popular statistical technique that uses genetic variants to explore causal relationships in observational epidemiology. Summary-level MR, the most common form, relies on published GWAS summary statistics to estimate causal effects between exposures and outcomes. However, empirical analyses tend to ignore issues relating to Winner’s Curse of instrument effects, weak instrument bias and sample overlap. Our simulations and empirical examinations using the UK Biobank indicate that such mechanisms can induce substantial bias in routine MR approaches. We propose MR Simulated Sample Splitting (MR-SimSS), a novel method that corrects this bias requiring no additional data beyond the exposure and outcome GWAS summary statistics under examination. It operates by simulating statistically independent sets of summary statistics, analogous to what would be produced by splitting the individual-level data into independent subsets, which can then be plugged into existing pleiotropy-robust MR methods. With sufficient instrument variants, MR-SimSS is robust to a range of sample overlap scenarios, providing a practical and modular solution to Winner’s Curse and weak instrument bias.

Author summary

A central challenge in epidemiology is determining whether an observed association reflects a true cause- and-effect relationship. Mendelian randomization (MR) addresses this by using genetic variants as natural experiments to test whether a particular trait or exposure genuinely influences disease risk. However, when the same genetic data are used both to select and to estimate genetic instruments, MR results can become biased due to a phenomenon known as the Winner’s Curse. This problem, along with weak instruments and sample overlap between datasets, can distort causal estimates even in large studies. We introduce MR Simulated Sample Splitting (MR-SimSS), a new framework that overcomes these issues using only publicly available genome-wide association study (GWAS) summary statistics. MR-SimSS works by statistically simulating independent subsets of the data, without requiring access to individual-level information, allowing existing MR methods to be applied without bias. Through extensive simulations and analyses using UK Biobank data, we show that MR-SimSS provides more accurate and reliable causal estimates, offering a practical tool for robust causal inference in modern genetic epidemiology.

Article activity feed