Bayesian LASSO with Categorical Predictors: Coding Strategies, Uncertainty Quantification, and Healthcare Applications

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

There is a growing interest in applying statistical machine learning methods, such as LASSO regression and its extensions, to analyze healthcare datasets. One representative recent study conducted by Huang et al. has examined LASSO and group LASSO regression with categorical predictors that are widely used in healthcare studies to represent variables with nominal or ordinal categories. Despite the success of these studies, statistical inference procedures and quantifying uncertainty for regression with categorical predictors have largely been overlooked, partly due to the theoretical challenges practitioners face when applying these methods in behavioral research. In this article, we aim to fill this gap by investigating from a Bayesian perspective. Specifically, we conduct Bayesian LASSO analysis with categorical predictors under different coding strategies, and thoroughly investigate the impact of four representative coding strategies on variable selection and prediction. In particular, we have conducted uncertainty quantification in terms of marginal Bayesian credible intervals by leveraging the advantage that fully Bayesian analysis can enable exact statistical inference even on finite samples. In this study, we demonstrate that the variable selection, estimation and prediction of Bayesian LASSO are influenced by the coding strategies with the real-world Medical Expenditure Panel Survey (MEPS) data. The performance of Bayesian LASSO has also been compared with LASSO and linear regression.

Article activity feed