Reevaluating Zero Initialization in Deep Learning

Jongwoo Seo
Wuhyun Koh

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

For nearly fifty years, the AI community has believed that zero initialization is ineffective for neural networks. Our study challenges this misconception by introducing methods that enable successful learning even when all weights and biases are initialized to zero. Here, we propose that random initialization can be viewed as one of many realizations within a broader zero-initialization framework. Experiments on MNIST, CIFAR-10, and CIFAR-100 using multilayer perceptrons (MLPs), convolutional neural networks (CNNs), residual networks (ResNets), vision transformers (ViTs), and multilayer perceptron mixers (MLP-Mixers) show that zero initialization can match or even surpass random initialization in certain scenarios, particularly with MLPs and CNNs. Notably, MLP-Mixers retained full performance even when half of their parameters were initialized to zero. These findings position random initialization as a special case of zero-centered symmetry breaking and refute the longstanding belief that zero initialization inherently degrades neural network performance, opening new possibilities for neural network training.

Version published to 10.21203/rs.3.rs-4890533/v2 on Research Square
Apr 7, 2025
Version published to 10.21203/rs.3.rs-4890533/v1 on Research Square
Aug 15, 2024

A Globally Optimal Alternative to MLP

This article has 3 authors:
1. Zheng Li
2. Jerry Cheng
3. Huanying Gu
This article has no evaluationsLatest version Jun 16, 2025
A Novel Differential Loss Function for Enhancing Generalization in Machine Learning Models

This article has 1 author:
1. Eyas Gaffar A. Osman
This article has no evaluationsLatest version May 13, 2025
LoRAE: Low-Rank Adaptation for Edge AI

This article has 3 authors:
1. Zhixue Wang
2. Hongyao Ma
3. Jiahui Zhai
This article has no evaluationsLatest version Jun 6, 2025

Listed in

Abstract

Article activity feed

Related articles

A Globally Optimal Alternative to MLP

A Novel Differential Loss Function for Enhancing Generalization in Machine Learning Models

LoRAE: Low-Rank Adaptation for Edge AI