Bayesian LASSO with Categorical Predictors: Coding Strategies, Uncertainty Quantification, and Healthcare Applications

Xi Lu
Jieni Li
Rajender R. Aparasu
Nebil Yusuf
Cen Wu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

There is a growing interest in applying statistical machine learning methods, such as LASSO regression and its extensions, to analyze healthcare datasets. One representative recent study conducted by Huang et al. has examined LASSO and group LASSO regression with categorical predictors that are widely used in healthcare studies to represent variables with nominal or ordinal categories. Despite the success of these studies, statistical inference procedures and quantifying uncertainty for regression with categorical predictors have largely been overlooked, partly due to the theoretical challenges practitioners face when applying these methods in behavioral research. In this article, we aim to fill this gap by investigating from a Bayesian perspective. Specifically, we conduct Bayesian LASSO analysis with categorical predictors under different coding strategies, and thoroughly investigate the impact of four representative coding strategies on variable selection and prediction. In particular, we have conducted uncertainty quantification in terms of marginal Bayesian credible intervals by leveraging the advantage that fully Bayesian analysis can enable exact statistical inference even on finite samples. In this study, we demonstrate that the variable selection, estimation and prediction of Bayesian LASSO are influenced by the coding strategies with the real-world Medical Expenditure Panel Survey (MEPS) data. The performance of Bayesian LASSO has also been compared with LASSO and linear regression.

Version published to 10.20944/preprints202510.0425.v1
Oct 8, 2025

Choosing informative priors in Bayesian regression models. A simulation study and tutorial using Stan and R

This article has 4 authors:
1. Daniel Lüdecke
2. Anna Makowski
3. Jens Klein
4. Dominique Makowski
This article has no evaluationsLatest version Oct 6, 2025
Predictive Performance Precision Analysis in Medicine: Identification of low-confidence predictions at patient and profile levels (MED3pa I)

This article has 7 authors:
1. Olivier Lefebvre
2. Félix Camirand Lemyre
3. Jean-François Ethier
4. Lyna Hiba Chikouche
5. Ludmila Amriou
6. Dan Poenaru
7. Martin Vallìeres
This article has no evaluationsLatest version Aug 26, 2025
Quantifying Uncertainty in Polygenic Risk Scores Using Conformalized Quantile Regression

This article has 9 authors:
1. Chen Wang
2. Fan Wang
3. Malgorzata Bogdan
4. Marco Masala
5. Edoardo Fiorillo
6. Marcella Devoto
7. Francesco Cucca
8. Dan Belsky
9. Iuliana Ionita-Laza
This article has no evaluationsLatest version Oct 14, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Choosing informative priors in Bayesian regression models. A simulation study and tutorial using Stan and R

Predictive Performance Precision Analysis in Medicine: Identification of low-confidence predictions at patient and profile levels (MED3pa I)

Quantifying Uncertainty in Polygenic Risk Scores Using Conformalized Quantile Regression