Does distilling Bayesian priors into language models support rapid language learning? Comment on McCoy and Griffiths (2025)
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Language models (LMs) are typically trained on orders of magnitude more language data than children are exposed to.1 Although there have been several attempts to show that LMs with no language-specific priors can learn core aspects of language when trained on a diet of human data, in all cases, the models were provided with much more data and given easier tasks to learn.2 Thus, McCoy and Griffiths’ finding that LMs can learn English rapidly when distilled with Bayesian language priors is an important result if correct.3 It would suggest that LMs can be used as models of human language acquisition when endowed with something akin to a Universal Grammar. 4 However, the authors’ simulations do not support this conclusion.