Controls for non-independence should reflect the data-generating process
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In our article, we simulated national-level trait data with non-independence due to spatial diffusion or shared language ancestry. These are plausible scenarios from which we can evaluate the ability of different methods to recover the “true” cross-national correlation. We used these simulations to evaluate the performance of widely-used statistical controls for non-independence and showed that these methods do not result in a satisfactory reduction in false positives. This finding holds and is concerning regardless of our ability to identify an alternative class of models that performs better. We then further showed that Bayesian random effects models incorporating explicit assumptions about the data-generating process do indeed perform much better. In particular, the Bayesian model controlling for spatial proximity was most effective at reducing false positives in spatially non-independent data, and the Bayesian model controlling for linguistic proximity was most effective at reducing false positives in “language tree” non-independent data.