Spectral Compactness Ensures Robustness in Low-Precision Neural Networks

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Low-precision arithmetic (FP16/INT8/INT4) is increasingly required to deploy large neural networks on resource-constrained hardware, yet practitioners frequently observe training or inference failures such as NaN divergence and unstable gradient flow after quantization. This work proposes a unifying linear-algebraic explanation: \emph{the tail class of the singular spectrum} of weight matrices governs whether finite-precision perturbations accumulate benignly or catastrophically. We formalize a \emph{spectral compactness} condition using trace-norm (nuclear-norm) mass concentration and show that exponentially decaying singular spectra induce an explicit \emph{quantization threshold} that confines dynamics to a numerically robust low-rank subspace. We then give a practical recipe---\emph{nuclear initialization} and trace-norm regularization---to enforce spectral compactness in low-precision neural networks. Synthetic experiments isolating spectral effects (diagonal spectra under float32 iteration and coarse quantization) show large gains in (i) effective rank compression (e.g., $82 \rightarrow 11$ to capture $\approx 90\%$ of trace-norm mass), (ii) eigenvalue distinguishability after quantization (e.g., $36.0\% \rightarrow 63.6\%$), and (iii) resistance to finite-precision dissipation over long iterative depth ($t=1000$) where heavy-tailed spectra collapse toward the floating-point floor. These results suggest that spectral-tail shaping is a computational necessity for robust low-bit deployment and a principled initialization/regularization tool for quantization-aware training and low-rank adaptation.

Article activity feed