GRASS-NB: Group-structured variable selection for spatial negative binomial data with applications to cancer registry and spatial omics

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Spatially structured, overdispersed count data with high-dimensional predictors are increasingly observed across studies from population-level epidemiology to cellular-level spatial omics. Feature selection is critical to identify influential predictors, such as key risk factors or biomarkers. Few Bayesian studies have assessed negative binomial regression (NBR) models with standard variable selection priors, like the mixture spike-and-slab (SS) or continuous horseshoe (HS), but mostly under aspatial settings. Features often form groups; for instance, in population surveys, caloric intake and physical activity may fall under “Diet & Exercise”, while cigarette use and smoking laws belong to “Smoking”. We propose a flexible NBR model that accommodates spatial autocorrelation and introduces a novel group-structured prior by hybridizing SS and HS shrinkage. The model’s performance with different priors is evaluated in terms of specificity, precision, and computational cost under challenging scenarios, including “large p , small n ” cases. We further apply the model to CDC state-level cancer data, comprising demographic, screening, and behavioral covariates, to identify key drivers and population-level risk factors, and to a melanoma spatial omics dataset for predictive modeling expression of gene. An efficient R package is provided on GitHub .

Article activity feed