From Double Dipping to Valid Inference: A Didactical and Comprehensive Review of Post-Selection Inference Methods for Multiple Linear Regression

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Researchers in psychology frequently combine exploratory data analysis (EDA) with confirmatory data analysis (CDA). In multiple linear regression, this practice often involves using the same dataset both to select predictors and to conduct statistical inference. This leads to invalid post-selection inference (PoSI), a violation of classical statistical assumptions that can produce biased standard errors, distorted p-values, and inflated Type I error rates. As a result, conclusions drawn from such analyses may be misleading or overstated. This paper provides a comprehensive and didactical review of the problem of invalid PoSI arising from data-driven variable selection in multiple linear regression. We illustrate how standard inferential procedures fail after model selection and demonstrate these issues through simulation studies conducted under different model specifications. We also discuss several established statistical approaches for obtaining valid inference after model selection, highlighting their assumptions, strengths, and limitations.Our goal is to make the issue of invalid PoSI accessible to applied psychological researchers and to offer practical guidance for conducting statistically valid inference following variable selection.

Article activity feed