Scaffolded representation learning in deep networks

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deep networks learn coarse structure before fine-grained distinctions, yet whether coarse structure actively scaffolds later differentiation remains untested. Here we show that representations assemble through a load-bearing scaffold. Tracking features at per-sample resolution across 55 runs, three architecture families and two training datasets, we find a reproducible three-phase program: task-general features emerge and dominate first, superclass groupings form next, and class-level distinctions develop last. Selectively corrupting superclass boundaries impairs later differentiation, suggesting that fine-grained learning depends on the coherence of coarser representations. Conversely, a curriculum that pre-builds the scaffold reduces differentiation cost 6.7-fold while nearly preserving accuracy and halving overfitting. These findings connect critical learning periods, neural collapse, progressive differentiation, the lottery ticket hypotheses, and catastrophic forgetting within a single developmental account and provide training diagnostic insights relevant for curriculum design, transfer timing, and mechanistic interpretability.

Article activity feed