Critical Depth and the Scaling Law Paradox: A Refactored Resource Model

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In this work, we present a refined interpretation of the Neural Scaling Laws that is inspired by phase transitions. Our starting point is from the paper: A Resource Based Model For Neural Scaling Laws. Indeed, there is both empirical and theoretical backing for the \( \ell \propto N_p^{-1/3} \) power law. Our formalization through the combined use of propositional logic and an SMT-solver allows us to draw new perspectives on the learning curve. As formulated and relied on for their internal consistency in the prior work, the critical depth conjecture is strengthened in our work. Rooted in a combination of: logic and empirical/theoretical insights, we draw a three regime profile of the Neural Scaling Laws. Ultimately, our physics-inspired proposal of Neural Scaling Laws profiles as follows: 1) structural phase, where we argue that, the loss scales following: \( \ell \propto N_p^{-2/3} \), 2) Above the \textit{critical depth}, a redundancy phase, with a loss following the classical: \( \ell \propto N_p^{-1/3} \) (where most of current LLMs operate). Finally, 3) an optimized trajectory where depth is fixed and a scaling is based on width following: \( \ell \propto N_p^{-1/3} \).

Article activity feed