Silent collapse in large neural networks: standard evaluation conceals systematic reasoning failure
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Fine-tuned neural networks can achieve near-perfect scores on standard benchmarks while systematically relying on spurious shortcuts rather than genuine reasoning—a phenomenon we term ‘silent collapse’. Through controlled experiments across four architecture families (86M–14B parameters), six tasks, and two modalities, we show that silent collapse becomes more severe with increasing model scale: larger models require progressively tighter training constraints to maintain genuine reasoning capability, with the optimal trainable fraction falling from ~50% at 160M to ~15% at 6.9B parameters. Two prospective predictions on models up to 14 billion parameters were experimentally tested, with results largely consistent with the predicted trends. Evaluation of widely-deployed models reveals that a leading NLI classifier achieves 90% on standard benchmarks but performs at chance level under adversarial evaluation (I_wild = 0.37). Together, these results show that standard benchmarks can be non-diagnostic for shortcut reliance at scale, and that calibrated constraint provides a practical way to make fine-tuning outcomes reliably reproducible.