Spectral Compactness Ensures Robustness in Low-Precision Neural Networks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Low-precision arithmetic (FP16/INT8/INT4) is increasingly required to deploy large neural networks on resource-constrained hardware, yet practitioners frequently observe training or inference failures such as NaN divergence and unstable gradient flow after quantization. This work proposes a unifying linear-algebraic explanation: \emph{the tail class of the singular spectrum} of weight matrices governs whether finite-precision perturbations accumulate benignly or catastrophically. We formalize a \emph{spectral compactness} condition using trace-norm (nuclear-norm) mass concentration and show that exponentially decaying singular spectra induce an explicit \emph{quantization threshold} that confines dynamics to a numerically robust low-rank subspace. We then give a practical recipe---\emph{nuclear initialization} and trace-norm regularization---to enforce spectral compactness in low-precision neural networks. Synthetic experiments isolating spectral effects (diagonal spectra under float32 iteration and coarse quantization) show large gains in (i) effective rank compression (e.g., $82 \rightarrow 11$ to capture $\approx 90\%$ of trace-norm mass), (ii) eigenvalue distinguishability after quantization (e.g., $36.0\% \rightarrow 63.6\%$), and (iii) resistance to finite-precision dissipation over long iterative depth ($t=1000$) where heavy-tailed spectra collapse toward the floating-point floor. These results suggest that spectral-tail shaping is a computational necessity for robust low-bit deployment and a principled initialization/regularization tool for quantization-aware training and low-rank adaptation.