Choosing informative priors in Bayesian regression models. A simulation study and tutorial using Stan and R

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Bayesian regression models provide a robust framework for complex data analysis, proving particularly advantageous in scenarios with small sample sizes common in medical research. However, specifying appropriate prior distributions, which incorporate existing knowledge to regularize model parameters, remains a challenge for many researchers. This can lead to unstable or implausible estimates. This study aims to demonstrate the impact of different prior distributions on regression models and provide a practical guide for choosing and justifying informative priors to produce more stable and credible results. Methods The study involved two parts. First, a simulation study was conducted to systematically assess the sensitivity of Bayesian linear regression models to prior specification. We systematically varied sample size, prior location, and prior scale to observe the impact on posterior estimates for a known true effect size. Second, a case-control study using real-world patient data (N = 526) demonstrated the practical application of choosing informative priors. Bayesian logistic regression models were used to analyse the relationship between severe dementia and fall incidence, comparing results from priors based on existing literature (“believer”), conservative priors (“agnostic”), and priors assuming an opposite effect (“sceptical”). Results The simulation study showed that strongly informative priors had a substantial influence on posterior estimates, particularly at smaller sample sizes. As the sample size increased, the influence of the data grew, and the estimates converged toward the true effect. In the case-control study, a standard frequentist analysis produced an odds ratio of 8.87 with a very wide and unstable confidence interval (1.66–165.19). In contrast, a Bayesian model using a moderately informative “believer” prior derived from existing research yielded a more plausible odds ratio of 4.40 with a substantially narrower and more precise credible interval (1.82–12.54). Conclusions The careful and transparent specification of informative priors is a critical tool in Bayesian analysis, especially when data are sparse. By incorporating justified, evidence-based assumptions, researchers can regularize models to prevent implausible outcomes and produce more stable, interpretable, and credible results. This approach enhances the robustness of statistical inference in fields where small sample sizes are a frequent challenge.

Article activity feed