Assumption violation detection in linear regression: A(nother) cautionary tale
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Violating linear regression assumptions, specifically homoskedasticity, can noticeably influence the validity of null hypothesis significance tests on regression coefficients. Commercial software like SPSS already includes many tests for assumption violations, and robust inference alternatives can be applied just as easily. Hence, it may seem reasonable to first test linear regression assumptions, and then, depending on the outcome, decide whether a robust inference method is used instead of the classical method that relies on assumptions being met. This simulation study evaluates the performance of such a two-step decision process in scenarios where neither, one, or both the homoskedasticity and normality assumptions are violated. Additionally, we assess the general performance of four inference methods (classical, HC4, pairs bootstrap and wild bootstrap) as well as tests of normality (Kolmogorov-Smirnov, Shapiro-Wilk, skewness, and z-kurtosis tests) and of homoskedasticity assumptions (Breusch-Pagan, Modified Breusch-Pagan, White’s, and the F-test) that are available in SPSS. We find that, if the goal is to generally detect deviations from normality, homoskedasticity, or a combination of both, the z-kurtosis test performs most consistently but has low power in small samples. Further, in small samples, we cannot recommend the typical two-step approach due to inflated type I error rates, whereas in large samples it provides no benefit over choosing a robust inference method without a prior assumption check. Thus, we can generally not recommend the typical two-step approach to inference in linear regression problems.