Does distilling Bayesian priors into language models support rapid language learning? Comment on McCoy and Griffiths (2025)

Jeffrey S Bowers

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Language models (LMs) are typically trained on orders of magnitude more language data than children are exposed to.1 Although there have been several attempts to show that LMs with no language-specific priors can learn core aspects of language when trained on a diet of human data, in all cases, the models were provided with much more data and given easier tasks to learn.2 Thus, McCoy and Griffiths’ finding that LMs can learn English rapidly when distilled with Bayesian language priors is an important result if correct.3 It would suggest that LMs can be used as models of human language acquisition when endowed with something akin to a Universal Grammar. 4 However, the authors’ simulations do not support this conclusion.

Version published to 10.31234/osf.io/zeprf_v1 on OSF Preprints
Sep 30, 2025

Language learning as flexible adaptation

This article has 2 authors:
1. Manuel Bohn
2. Marisa Casillas
This article has no evaluationsLatest version Oct 6, 2025
The successes and failures of Artificial Neural Networks (ANNs) highlight the importance of innate linguistic priors for human language acquisition

This article has 1 author:
1. Jeffrey S Bowers
This article has no evaluationsLatest version Sep 10, 2025
Core vocabulary reveals differences between human word prediction and large language models

This article has 4 authors:
1. Andrew Wang
2. Simon De Deyne
3. Meredith McKague
4. Andrew Perfors
This article has no evaluationsLatest version Aug 29, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Language learning as flexible adaptation

The successes and failures of Artificial Neural Networks (ANNs) highlight the importance of innate linguistic priors for human language acquisition

Core vocabulary reveals differences between human word prediction and large language models