Optimizing Explainability-Accuracy Trade-offs in Deep Neural Networks via Constrained Information Bottleneck Regularization
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The increasing complexity of deep neural networks (DNNs) has led to a pressing need to optimize the trade-off between model accuracy and explainability. In this paper, we introduce a novel framework employing constrained information bottleneck regularization to explicitly balance these two critical aspects of model performance. Our methodology formalizes the relationship between accuracy and explainability as a constrained optimization problem, enabling the development of interpretable models without sacrificing predictive power. We explore the mathematical underpinnings of our approach, detailing the use of dual decomposition techniques and differentiable surrogate objectives for efficient implementation. Comprehensive empirical evaluations on benchmark datasets in vision and language demonstrate significant improvements in explainability-accuracy trade-offs compared to state-of-the-art methods. Our findings reveal that this framework can produce models that are not only high-performing but also adhere to stringent explainability constraints. Ultimately, this work aims to catalyze a paradigm shift within the AI community towards the development of reliable, transparent, and interpretable AI systems, ensuring their responsible deployment in high-stakes applications.