Bayesian Statistical Hypothesis Testing in the Era of Big Data

carlos barrera-causil
juan carlos correa
Johny Javier Pambabay Calero
Sergio Alex Bauz Olvera
Daniel Andres Dıaz-Pachon
julian tejada
Fernando Marmolejo-Ramos

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Big data poses serious challenges for traditional statistics and complicates Bayesian hypothesis testing, where Bayes Factors often become numerically unstable and their behavior under large sample sizes remains unclear. This paper introduces a subsampling-based heuristic using Monte Carlo methods to address these issues. The method ensures stable inference and reveals how evidence strength changes with sample size. It supports a wide range of priors—including non-informative, empirical, and expert-elicited maximum entropy priors—enabling robust sensitivity analysis and context-specific conclusions, especially in high-dimensional settings. We demonstrate its effectiveness and interpretability through simulated human height data and Bayesian regression on neuron volume. This work advances a more stable and transparent approach to Bayesian hypothesis testing in the context of big data.

Version published to 10.21203/rs.3.rs-7313325/v1 on Research Square
Aug 28, 2025

Inference using recall-based data from log-normal distribution

This article has 2 authors:
1. Rupsa Roy
2. Chandra Prakash Yadav
This article has no evaluationsLatest version Aug 29, 2025
A Bayesian Informative Shrinkage Approach for Large-scale Multiple Hypothesis Testing (BISHOT): with Applications in Differential Analysis of Omics Data

This article has 3 authors:
1. Ya Su
2. Mary Eunice Joy Z. Clark
3. Chi Wang
This article has no evaluationsLatest version Sep 16, 2025
Replicative significance index (RSI): A simulation-based metric for statistical inference and reproducibility

This article has 3 authors:
1. Vinay Suresh
2. Bhavik Bansal
3. Suhrud Panchawagh
This article has no evaluationsLatest version Sep 9, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Inference using recall-based data from log-normal distribution

A Bayesian Informative Shrinkage Approach for Large-scale Multiple Hypothesis Testing (BISHOT): with Applications in Differential Analysis of Omics Data

Replicative significance index (RSI): A simulation-based metric for statistical inference and reproducibility