Silent collapse in large neural networks: standard evaluation conceals systematic reasoning failure

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Fine-tuned neural networks can achieve near-perfect scores on standard benchmarks while systematically relying on spurious shortcuts rather than genuine reasoning—a phenomenon we term ‘silent collapse’. Through controlled experiments across four architecture families (86M–14B parameters), six tasks, and two modalities, we show that silent collapse becomes more severe with increasing model scale: larger models require progressively tighter training constraints to maintain genuine reasoning capability, with the optimal trainable fraction falling from ~50% at 160M to ~15% at 6.9B parameters. Two prospective predictions on models up to 14 billion parameters were experimentally tested, with results largely consistent with the predicted trends. Evaluation of widely-deployed models reveals that a leading NLI classifier achieves 90% on standard benchmarks but performs at chance level under adversarial evaluation (I_wild = 0.37). Together, these results show that standard benchmarks can be non-diagnostic for shortcut reliance at scale, and that calibrated constraint provides a practical way to make fine-tuning outcomes reliably reproducible.

Article activity feed