β-Optimization in the Information Bottleneck Framework: A Theoretical Analysis

Faruk Alpay

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The Information Bottleneck (IB) framework formalizes the trade-off between compression and prediction in representation learning. A crucial parameter is the Lagrange multiplier β, which controls the balance between preserving information relevant to a target variable Y and compressing the representation Z of an input X. Selecting an optimal β (denoted β&lowast;) is challenging and typically done via empirical tuning. In this paper, I present a rigorous theoretical analysis of β&lowast;-optimization in both the Variational IB (VIB) and Neural IB (NIB) settings. I define β&lowast; as the critical value of β that marks the boundary between non-trivial (informative) and trivial (uninformative) representations, ensuring maximal compression before the representation collapses. I derive formal conditions for its existence and uniqueness. I prove several key results: (1) the IB trade-off curve (relevance–compression frontier) is concave under mild conditions, implying that β, as the slope of this curve, uniquely characterizes optimal operating points in regular cases; (2) there exists a critical β threshold, β&lowast; = F′(0+) (the slope of the IB curve at zero compression), beyond which the IB solution collapses to a trivial representation; (3) for practical IB implementations (VIB and NIB), I discuss how β&lowast; can be computed algorithmically, including complexity analysis of naive β-sweeping versus adaptive methods like binary search, for which pseudo-code is provided. I provide formal theorems and proofs for concavity properties of the IB Lagrangian, continuity of the IB curve, and boundedness of mutual information quantities. Furthermore, I compare standard IB, VIB, and NIB formulations in terms of the optimal β, showing that while standard IB provides a theoretical target for β&lowast;, variational and neural approximations may deviate from this optimum. My analysis is complemented by a discussion on the implications for deep neural network representations. The results establish a principled foundation for β selection in IB, guiding practitioners to achieve maximal meaningful compression without exhaustive trial-and-error

Version published to 10.20944/preprints202505.0746.v1
May 12, 2025

Information Geometry of Gaussian Processes and Its Applications to Transfer Learning

This article has 2 authors:
1. Shotaro Akaho
2. Hideaki Ishibashi
This article has no evaluationsLatest version Jun 18, 2025
WITHDRAWN

This article has no evaluationsLatest version May 11, 2025
Rényi Entropy Estimation via Affine Transformations of Generalized Means

This article has 1 author:
1. Alessandro Gecchele
This article has no evaluationsLatest version May 13, 2025

Listed in

Abstract

Article activity feed

Related articles

Information Geometry of Gaussian Processes and Its Applications to Transfer Learning

WITHDRAWN

Rényi Entropy Estimation via Affine Transformations of Generalized Means