Bayesian PASA: Provably Stable AdaptiveActivation with Uncertainty Quantification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The choice of activation function is a fundamental design decision in deep learning, yet most popular options like ReLU, GELU, or Swish are static and treat all inputs uniformly. This one-size-fits-all approach breaks down in the presence of noisy or corrupted data, where the optimal non-linearity should depend on the input's statistical context. In this paper, we introduce Bayesian Probabilistic Adaptive Sigmoidal Activation (Bayesian PASA), a novel activation function that dynamically adapts its behavior based on the input's uncertainty. Bayesian PASA is not just a new function, but a new paradigm. It frames activation selection as a Bayesian model averaging problem, adaptively mixing sigmoidal, linear, and noise-aware behaviors. The mixing weights are derived from a principled variational evidence lower bound (ELBO), regularized by a stable ψ-function that guarantees bounded influence from noise estimates. We provide three formal theorems proving its Lipschitz continuity, gradient stability, and convergence under standard training assumptions. On the challenging CIFAR-100 benchmark, Bayesian PASA achieves a state-of-the-art test accuracy of 76.38% , outperforming ReLU (75.68%), GELU (75.98%), and the original PASA (75.53%). On the corrupted CIFAR-10-C dataset, the full Bayesian PASA model combined with Bayesian R-LayerNorm achieves an average accuracy of 53.91% , a  + 1.87% improvement over the ReLU+LayerNorm baseline. This work provides a drop-in replacement for existing activations, offering not only improved performance but also built-in uncertainty quantification for more robust deep learning systems.

Article activity feed