Emulation of placebo-controlled index trials using observational data with cloning, censoring and weighting: Empirical assessment of constraints and credibility

Anna-Janina Stephan
Gerard Portela
Raisa Levin
Nils Krüger
Sebastian Schneeweiss
Rishi J. Desai

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective

Target trial emulation (TTE) has become a prominent approach to conducting observational effectiveness studies, yet limited attention has been paid to the nuances of emulating placebo-controlled trials in this framework using claims data. As a demonstration, we aimed to expand evidence generated by the TOPCAT trial comparing spironolactone versus placebo in patients with heart failure with preserved ejection fraction (HFpEF) to the U.S. HFpEF population.

Methods and Analysis

We estimated the observational analogue of the per-protocol effect for spironolactone initiation and continued use versus non-initiation in 2012-2020 Medicare claims with the clone-censor-weight approach. We evaluated two composite effectiveness endpoints of heart failure hospitalization (HHF) and cardiac arrest with either all-cause or cardiovascular mortality, respectively, as well as each component except cardiac arrest as an individual endpoint. Anticipating threats to validity through residual confounding, we pre-specified two guardrails: 1) benchmarking against results from TOPCAT Americas, and 2) evaluation of non-cardiovascular mortality as negative control outcome to quantify and correct for the magnitude of residual bias. To demonstrate investigator-induced biases frequently seen in studies not using the TTE framework, we additionally implemented a ‘naïve’ ever- vs never-user comparison that misclassified immortal person-time before spironolactone initiation as exposed.

Results

We included 320,881 patients with HFpEF in the overall Medicare cohort (mean age 80.6 years (SD 8.37); female 62%), of which 49,729 qualified for benchmarking against TOPCAT. In the benchmarking cohort, relative risks with spironolactone use compared to non-use for effectiveness outcomes ranged between 0.97 (95%-CI = [0.94; 1.01]) for the composite with cardiovascular death and 1.14 (95%-CI = [1.11; 1.18]) for all-cause mortality. The negative control of non-cardiovascular mortality suggested presence of residual confounding. After bias correction, our relative risks were in line with TOPCAT hazard ratios for HHF-driven outcomes (e.g. composite with cardiovascular death 0.88 (95%-CI = [0.85; 0.91]) in our study vs. 0.82 (95%-CI = [0.69; 0.98]) in TOPCAT), but not for mortality outcomes (e.g. all-cause death 1.04 (95%-CI = [1.01; 1.07]) vs. 0.83 (95%-CI = [0.68; 1.02]) in TOPCAT). Estimates in the overall cohort were comparable to the benchmarking cohort. The naïve analysis of ever versus never-use produced substantially biased results (e.g. 1.22 (95%-CI = [1.13; 1.30], composite with cardiovascular death) to 0.58 (95%-CI = [0.53; 0.65], all-cause death, benchmarking cohort).

Conclusion

In emulations of placebo-controlled trials, residual confounding remains a persistent threat and it is critical to build in pre-specified guardrails to detect and address this bias.

Key messages

What is already known on this topic – Target trial emulation presents a principled framework of designing observational studies, and within this framework, the clone-censor-weight approach has been recommended to avoid immortal time bias when emulating placebo-controlled trials.
What this study adds – Even after fully avoiding immortal time through the clone-censor-weight approach within the target trial framework, observational studies of non-use comparisons remain prone to other sources of bias. Bias analysis and benchmarking can help gauge the extent and direction of such bias.
How this study might affect research, practice or policy – This study showcases how researchers can leverage pre-specified benchmarking and net bias analysis as guardrails when using the clone-censor-weight design for non-use-comparisons to ensure accurate interpretation. It also provides auxiliary evidence on the effects of spironolactone in HFpEF for the Medicare population beyond TOPCAT that may inform clinical decision-making.

Version published to 10.1101/2025.10.20.25337820 on medRxiv
Oct 29, 2025

Missing Data in OHCA Registries: How Multiple Imputation Methods Affect Research Conclusions—Paper II

This article has 4 authors:
1. Stella Jinran Zhan
2. Seyed Ehsan Saffari
3. Marcus Eng Hock Ong
4. Fahad Javaid Siddiqui
This article has no evaluationsLatest version Jan 16, 2026
Methodological Analysis of Bias Risks in Adaptive Multi-Arm Platform Trials: A Case-Series from Three COVID-19 Studies

This article has 1 author:
1. Simon Reich
This article has no evaluationsLatest version Jan 6, 2026
Enhancing Randomized Controlled Trials Through Smartwatch-Guided Participant Matching

This article has 6 authors:
1. Edan Shahmoon
2. Matan Yechezkel
3. Shachar Snir
4. Marco V. Perez
5. Margaret L. Brandeau
6. Dan Yamin
This article has no evaluationsLatest version Jan 16, 2026

Discuss this preprint

Listed in

Abstract

Objective

Methods and Analysis

Results

Conclusion

Key messages

Article activity feed

Related articles

Missing Data in OHCA Registries: How Multiple Imputation Methods Affect Research Conclusions—Paper II

Methodological Analysis of Bias Risks in Adaptive Multi-Arm Platform Trials: A Case-Series from Three COVID-19 Studies

Enhancing Randomized Controlled Trials Through Smartwatch-Guided Participant Matching