A plasmode simulation-based bias analysis for residual confounding by unmeasured variables leveraging information-rich subsets

Rishi J Desai
Shirley V Wang
Haritha S. Pillai
Mufaddal Mahesri
Bowen Gu
Joyce Lii
Sarah Dutcher
Chanelle Jones
Fatma M. Shebl
Marie C. Bradley
Wei Hua
Hana Lee
Gerald J. Dal Pan
Sebastian Schneeweiss
Robert Ball

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Quantitative bias analyses often rely on unrealistic assumptions and do not fully reflect the complexities of healthcare data.

Methods

We describe a ‘plasmode’ simulation-based bias analysis for residual confounding from unmeasured variables by leveraging granular information from a subset of cohort members. We generated 500 simulated cohorts based on individual-level claims and linked electronic health record (EHR) data identifying new users of varenicline and bupropion from the Mass General Brigham site of the FDA Sentinel Real World Evidence Data Enterprise. Two adverse outcomes were simulated: 1) neuropsychiatric hospitalizations and 2) major adverse cardiovascular events (MACE), and measured confounding factors, identified from information available in claims including demographics, comorbid conditions, and comedications, were tailored to each outcome. Residual confounding was simulated using potential confounders measured in EHRs but unmeasured in claims including suicidal ideation for the neuropsychiatric outcomes and body mass index (BMI), blood pressure (BP), and smoking pack-years for the MACE outcome. These simulations retained the correlation between claims and EHR-based confounders observed in empirical data for realistic reflection of proxy adjustment of unmeasured confounders. Analyses were conducted in simulated data with and without adjustment for the EHR-based covariates to evaluate the extent of residual confounding in claims-only analyses.

Results

After 500 simulations, the median absolute standardized mean difference (ASMD) between treatment groups in the unadjusted sample was 0.16 for suicidal ideation; while <0.1 for BMI, BP, and smoking pack-years. For both outcomes, adjustment using claims-based variables provided relative bias close to 0, leading to the conclusion that EHR-measured confounders that were unmeasured in claims were unlikely to result in strong residual confounding within realistic simulations informed by empirical data.

Conclusion

The proposed approach provides a method for quantifying bias in non-randomized studies threatened by unavailability of potentially important confounding variables.

Key points

Residual confounding by unmeasured factors is a central threat in pharmacoepidemiology that is almost always acknowledged in published studies but seldom quantified.
We describe a plasmode-simulation based approach to systematically design quantitative bias analyses that reflect the complexities of routinely collected healthcare data by leveraging detailed electronic health records from a subset.
We provide open-source software code to enable other researchers to adopt this method in future studies and improve the reliability of their findings.

Plain language summary

This study introduces a new way for researchers to better understand and measure bias caused by missing health information in large insurance databases. Using detailed hospital records alongside insurance claims data, we created realistic computer simulations to test how much of the observed risk in safety studies could be explained away by missing important health factors, like depression or smoking habits, that aren’t always recorded in insurance data. The approach is flexible, uses real patient data, and helps researchers make stronger, more reliable conclusions about risks and benefits of treatments, even when some patient information is not available in all records.

Version published to 10.1101/2025.10.28.25338968 on medRxiv
Oct 31, 2025

Missing Data in OHCA Registries: How Multiple Imputation Methods Affect Research Conclusions—Paper II

This article has 4 authors:
1. Stella Jinran Zhan
2. Seyed Ehsan Saffari
3. Marcus Eng Hock Ong
4. Fahad Javaid Siddiqui
This article has no evaluationsLatest version Jan 16, 2026
Evaluating Imputation Methods for Handling Missing Data in Complex Survey Designs: Evidence from the India DHS 2017–18

This article has 6 authors:
1. Mahfuzer Rohman
2. Md Sabbir Hossain
3. Md Fakrul Islam
4. Prosenjit Basak Arka
5. Md Rafi Hasan
6. Md Jamal Uddin
This article has no evaluationsLatest version Jan 23, 2026
Sociodemographic and Clinical Predictors of Chronic Disease Outcomes in a Colombian Population: A Cross-Sectional Analysis of 2495 Patients

This article has 6 authors:
1. Adriana Guzmán Sánchez
2. Lilibeth Sánchez-Guette
3. Armando Monterrosa-Quintero
4. Yaneth Herazo-Beltrán
5. Narledis Nuñez-Bravo
6. Carlos Andrés Collazos Morales
This article has no evaluationsLatest version Dec 18, 2025

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusion

Key points

Plain language summary

Article activity feed

Related articles

Missing Data in OHCA Registries: How Multiple Imputation Methods Affect Research Conclusions—Paper II

Evaluating Imputation Methods for Handling Missing Data in Complex Survey Designs: Evidence from the India DHS 2017–18

Sociodemographic and Clinical Predictors of Chronic Disease Outcomes in a Colombian Population: A Cross-Sectional Analysis of 2495 Patients