Post-matching analysis after coarsened exact matching: implications of coarsening for residual confounding and model dependence
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Coarsened Exact Matching (CEM) is a widely used design strategy aimed at reducing confounding in observational studies by matching treated and control units within strata of coarsened covariates. It is often promoted as a method that mimics a randomized block design, which has led many researchers to apply simple, unadjusted statistical methods—such as paired t -tests or McNemar’s test—originally developed for blocked randomized designs. However, CEM only ensures balance on the coarsened scale, and residual imbalances may remain on the original covariate scale, raising questions about the appropriateness of unadjusted analyses as the primary analytic approach. Methods We examine the implications of this coarsening process for post-matching analysis using literature review, conceptual arguments, and simulation studies. In particular, we evaluate how within-stratum heterogeneity in the original covariates affects residual confounding and the dependence of treatment effect estimates on outcome model specification. Results Our results show that matching on coarsened covariates can leave systematic differences between treated and control subjects within matched strata. These differences introduce residual confounding that does not disappear with increasing sample size. Simulation results further demonstrate a bias–variance trade-off induced by coarsening: fine coarsening may reduce residual confounding but can result in substantial data loss, whereas coarse binning preserves sample size at the cost of increased bias and greater reliance on outcome model specification. Conclusions CEM should be regarded primarily as a preprocessing tool for improving covariate overlap rather than as a stand-alone solution for confounding control. Valid causal inference following CEM generally requires regression adjustment using the original, uncoarsened covariates, and unadjusted analyses of matched data may yield biased treatment effect estimates.