Using simulations to explore sampling distributions: an antidote to hasty and extravagant inferences

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Most statistical inferences in neuroscience and psychology are based on frequentiststatistics, which rely on sampling distributions: the long-run outcomes of multipleexperiments, given a certain model. Yet, sampling distributions are poorly understoodand rarely explicitly considered when making inferences. In this tutorial andcommentary, I demonstrate how to use simulations to illustrate sampling distributionsto answer simple practical questions: for instance, if we could run thousands ofexperiments, what would the outcome look like? What do these simulations tell us aboutthe results from a single experiment? Such simulations can be run a priori, givenexpected results, or a posteriori, using existing datasets. Both approaches can helpmake explicit the data generating process and the sources of variability; they also revealthe large uncertainty in our experimental estimation and lead to the sobering realisationthat, in most situations, we should not make a big deal out of results from a singleexperiment. Simulations can also help demonstrate how the selection of effect sizesconditional on some arbitrary cut-off (p≤0.05) leads to a literature filled with falsepositives, a powerful illustration of the damage done in part by researchers’ over-confidence in their statistical tools. The tutorial focuses on graphical descriptions andcovers examples using correlation analyses, proportion data and response latency data.All the figures and numerical values in this article can be reproduced using codeavailable at https://github.com/GRousselet/sampdist.

Article activity feed