Reevaluating Zero Initialization in Deep Learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
For nearly fifty years, the AI community has believed that zero initialization is ineffective for neural networks. Our study challenges this misconception by introducing methods that enable successful learning even when all weights and biases are initialized to zero. Here, we propose that random initialization can be viewed as one of many realizations within a broader zero-initialization framework. Experiments on MNIST, CIFAR-10, and CIFAR-100 using multilayer perceptrons (MLPs), convolutional neural networks (CNNs), residual networks (ResNets), vision transformers (ViTs), and multilayer perceptron mixers (MLP-Mixers) show that zero initialization can match or even surpass random initialization in certain scenarios, particularly with MLPs and CNNs. Notably, MLP-Mixers retained full performance even when half of their parameters were initialized to zero. These findings position random initialization as a special case of zero-centered symmetry breaking and refute the longstanding belief that zero initialization inherently degrades neural network performance, opening new possibilities for neural network training.