Bayesian Statistical Hypothesis Testing in the Era of Big Data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Big data poses serious challenges for traditional statistics and complicates Bayesian hypothesis testing, where Bayes Factors often become numerically unstable and their behavior under large sample sizes remains unclear. This paper introduces a subsampling-based heuristic using Monte Carlo methods to address these issues. The method ensures stable inference and reveals how evidence strength changes with sample size. It supports a wide range of priors—including non-informative, empirical, and expert-elicited maximum entropy priors—enabling robust sensitivity analysis and context-specific conclusions, especially in high-dimensional settings. We demonstrate its effectiveness and interpretability through simulated human height data and Bayesian regression on neuron volume. This work advances a more stable and transparent approach to Bayesian hypothesis testing in the context of big data.