Adjustment for autocorrelation in multiple-group (controlled) interrupted time series analysis and its effect on power: A simulation study of the Newey-West and Prais-Winsten methods
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Multiple-group (controlled) interrupted time series analysis (MG-ITSA) is increasingly used to evaluate interventions in healthcare settings when randomized designs are infeasible. Simulation-based power analysis has been proposed to support prospective study planning in this framework, but uncertainty remains regarding how different approaches to autocorrelation adjustment affect power and inferential performance. This study evaluates how two commonly used methods (ordinary least squares with Newey-West standard errors [OLS-NW] and Prais-Winsten [PW] regression) compare when estimating power and other performance measures in MG-ITSA designs with first-order [AR(1)] autocorrelation. Methods A comprehensive Monte Carlo simulation study was conducted based on the standard MG-ITSA regression model. Data were generated under a wide range of design conditions, including varying numbers of time periods, numbers of control units, effect sizes, and levels of first-order autocorrelation. Treatment effects were defined as difference-in-differences in level and difference-in-differences in trend. In addition to power, several other performance measures were computed, included percentage bias, empirical standard errors, root mean squared error (RMSE), 95% confidence interval coverage, and Type I error rates. All analyses were conducted using the POWER_ITSA package for Stata. Results Across all scenarios, power increased with longer series, larger effect sizes, and more control units. OLS-NW consistently produced higher power than PW for both level and trend effects, with differences often exceeding 10%. Both methods yielded approximately unbiased estimates and nearly identical RMSE. However, important differences in inferential performance were observed: OLS-NW resulted in substantially lower confidence interval coverage and markedly inflated Type I error rates, particularly in shorter series, whereas PW maintained near-nominal coverage and Type I error control across all conditions. Conclusions In MG-ITSA with AR(1) autocorrelation, higher power achieved using OLS-NW comes at the cost of poor finite-sample inferential validity. PW provides more reliable hypothesis testing and uncertainty quantification, albeit with lower power. These findings highlight a fundamental trade-off between power and inferential calibration and underscore the importance of aligning estimator choice with study objectives when planning prospective MG-ITSA studies.