CBKMR: A Copula-based Bayesian Kernel Machine Regression Framework for Optimal Marker Detection in Omics Data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
High-throughput bulk and single-cell omics technologies enable comprehensive molecular profiling, yet identifying compact, biologically interpretable marker sets that distinguish cell types, conditions, or disease states remains challenging. Standard pipelines rely on univariate differential expression tests, which ignore gene–gene dependencies and nonlinear effects, while multivariate machine-learning (ML) methods often lack principled feature selection and uncertainty quantification. The Bayesian kernel machine regression (BKMR) framework offers an appealing alternative because it (a) captures non-linear gene–outcome relationships and higher-order interactions, and (b) enables automatic relevance determination (ARD) through sparsity-inducing priors. However, we show that the traditional latent Gaussian process (GP) formulation of BKMR is inadequate for discrete outcomes (e.g., cell-type labels), leading to biased inference and unstable variable selection. We propose a copula-based Bayesian kernel machine regression (CBKMR) model that uses outcome-appropriate discrete marginals while a Gaussian copula captures kernel-induced dependence across observations. To ensure scalability to modern single-cell datasets, we further introduce a nearest-neighbor GP-based variant, NNCBKMR, which reduces computational complexity from O( N 3 ) to nearly linear in N . Simulation studies show that CBKMR more accurately captures nonlinear effects and yields stronger marker-selection performance than BKMR and top ensemble ML methods (e.g., random forests, XGBoost). Applications to multiple scRNA-seq datasets demonstrate that CBKMR identifies concise marker panels that align closely with expert-annotated gene signatures while providingposterior uncertainty for principled decision-making.