Scaffolded representation learning in deep networks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Deep networks learn coarse structure before fine-grained distinctions, yet whether coarse structure actively scaffolds later differentiation remains untested. Here we show that representations assemble through a load-bearing scaffold. Tracking features at per-sample resolution across 55 runs, three architecture families and two training datasets, we find a reproducible three-phase program: task-general features emerge and dominate first, superclass groupings form next, and class-level distinctions develop last. Selectively corrupting superclass boundaries impairs later differentiation, suggesting that fine-grained learning depends on the coherence of coarser representations. Conversely, a curriculum that pre-builds the scaffold reduces differentiation cost 6.7-fold while nearly preserving accuracy and halving overfitting. These findings connect critical learning periods, neural collapse, progressive differentiation, the lottery ticket hypotheses, and catastrophic forgetting within a single developmental account and provide training diagnostic insights relevant for curriculum design, transfer timing, and mechanistic interpretability.