Deep Learning with Zero Initialization: Revisiting Symmetry Breaking and Gradient Flow

Jongwoo Seo
Wuhyun Koh

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

For decades, the artificial intelligence (AI) community has believed that zero initialization is ineffective for neural networks. Our study challenges this misconception by introducing a method that enables successful learning even when all weights and biases are initialized to zero. Beyond this method, we also examine mixed initialization schemes in which zero and random initialization coexist across different layers or parameters, showing that learning remains effective even under such partially randomized settings. Experiments on MNIST, CIFAR-10, CIFAR-100, and Tiny ImageNet using multilayer perceptrons (MLPs), convolutional neural networks (CNNs), residual networks (ResNets), vision transformers (ViTs), and multilayer perceptron mixers (MLP-Mixers) show that zero initialization can match or even surpass random initialization in certain scenarios, particularly with MLPs and CNNs. Notably, MLP-Mixers retained comparable performance despite having no randomly initialized parameters. These findings position random initialization as a special case of zero-centered symmetry breaking and refute the longstanding belief that zero initialization inherently degrades neural network performance, opening new possibilities for neural network training. To systematize these insights, we propose the "Seo Integrated Zero Initialization: Foundational Scheme (SIZIFS)" — a unified conceptual structure that categorizes artificial neural network initialization strategies into weight-level, node-level, and context-dependent types. Implementation code is publicly available at: https://github.com/sjw007s/Deep-Learning-with-Zero-Initialization-Revisiting-Symmetry-Breaking-and-Gradient-Flow.

Version published to 10.21203/rs.3.rs-7341654/v1 on Research Square
Dec 17, 2025

Discrete Weight Neural Networks: Investigating the Relationship Between Weight Precision and Generalization

This article has 1 author:
1. Avinav Sahoo
This article has no evaluationsLatest version Jan 9, 2026
A Hybrid Dropout Framework for Enhanced Generalization in Convolutional Neural Networks

This article has 5 authors:
1. Yashas Donthi
2. Talasila Dheeraj
3. Sahana S
4. Sravya D
5. Rajashree Shettar
This article has no evaluationsLatest version Jan 28, 2026
ZENITH: Automated Gradient Norm Informed Stochastic Optimization

This article has 1 author:
1. Dhrubo Saha
This article has no evaluationsLatest version Jan 22, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Discrete Weight Neural Networks: Investigating the Relationship Between Weight Precision and Generalization

A Hybrid Dropout Framework for Enhanced Generalization in Convolutional Neural Networks

ZENITH: Automated Gradient Norm Informed Stochastic Optimization