A Unified Framework for Non-Convex Optimization in Deep Learning via Adaptive Variance Reduction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The increasing complexity of deep learning models necessitates sophisticated optimization techniques to effectively navigate their non-convex loss landscapes. However, traditional optimization methods, such as stochastic gradient descent (SGD), often encounter challenges related to high variance in gradient estimates, leading to suboptimal convergence and performance. This paper proposes a unified framework for adaptive variance reduction in stochastic non-convex optimization, addressing these critical issues. We introduce the Adaptive Variance Reduced Gradient (AVRG) algorithm, which dynamically balances variance reduction with computational efficiency, yielding improved convergence rates and robustness. Our framework synthesizes existing adaptive variance reduction methods, providing a cohesive theoretical understanding while filling existing gaps in the literature. Comprehensive empirical evaluations across diverse deep learning benchmarks demonstrate the efficacy of AVRG compared to established methods, highlighting its faster convergence and superior performance. By equipping researchers and practitioners with enhanced optimization strategies, this work contributes significantly to the field of deep learning, paving the way for future research in more complex optimization scenarios. We explore the theoretical guarantees of our approach, aiming to foster advancements in training methodologies for deep neural networks, thereby enabling better performance across a wide array of applications.