Does distilling Bayesian priors into language models support rapid language learning? Comment on McCoy and Griffiths (2025)

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Language models (LMs) are typically trained on orders of magnitude more language data than children are exposed to.1 Although there have been several attempts to show that LMs with no language-specific priors can learn core aspects of language when trained on a diet of human data, in all cases, the models were provided with much more data and given easier tasks to learn.2 Thus, McCoy and Griffiths’ finding that LMs can learn English rapidly when distilled with Bayesian language priors is an important result if correct.3 It would suggest that LMs can be used as models of human language acquisition when endowed with something akin to a Universal Grammar. 4 However, the authors’ simulations do not support this conclusion.

Article activity feed