Mechanism Matters: A Monte Carlo Evaluation of Estimator Validity and Collider Bias in Environmental Mixture Epidemiology
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Mixture epidemiology deploys sophisticated estimators, Bayesian kernel machine regression with causal mediation analysis (BKMR-CMA), quantile G-computation (QGC), and parametric G-computation, alongside conventional regression. Comparative evaluations have assumed additive, non-mediated data-generating processes, leaving conditions under which estimator choice determines causal validity uncharacterized.
Methods
We developed a simulation framework using military-relevant exposure distributions (metals, per- and polyfluoroalkyl substances [PFAS], polychlorinated biphenyls [PCBs]) and allostatic load (AL) across three deployment tiers, with parameters drawn from military occupational health and contamination literature. Four data-generating processes were specified as directed acyclic graphs: direct effects with confounding (M1), full mediation through AL (M2), synergistic AL-exposure interaction (M3), and collider structure (M4). We evaluated ordinary least squares (OLS), QGC, G-computation, and BKMR-CMA on bias, root mean squared error, and 95% confidence interval coverage across 500 Monte Carlo replications at n = 500 and n = 1,000.
Results
No estimator dominated across all mechanisms. Under M1, OLS and G-computation produced near-identical modest positive bias; BKMR-CMA achieved lower root mean squared error through kernel shrinkage. Under M2, BKMR-CMA exhibited severe positive bias for AL (mean bias = +0.579 SD units; coverage = 32.8%). Under M3, BKMR-CMA was the only estimator achieving nominal 95% coverage for AL (95.2%), while regression-based approaches fell to 83.6%. Under M4, G-computation produced persistent bias and near-zero coverage for lead, reflecting structural non-identification.
Conclusions
Estimator validity is fundamentally mechanism-dependent. Researchers should base estimator choice on explicit causal assumptions about whether AL functions as confounder, mediator, moderator, or collider, particularly in military and occupational cohorts. We provide a mechanism-to-estimator mapping for applied researchers.
What this study adds
No single estimator dominates across causal mechanisms in mixture epidemiology. Using a military-relevant simulation, we show that BKMR-CMA excels under synergistic interactions but fails severely under mediation, while G-computation fails under collider structures that standard regression cannot detect. Estimator choice is a causal decision, not a technical one. We provide a mechanism-to-estimator mapping to guide researchers in occupational and environmental cohort studies where stress-related biomarkers function simultaneously as confounders, mediators, and colliders.