Bigger is not always better: The importance of human-scale language modeling for psycholinguistics

Ethan Gotlieb Wilcox
Michael Hu
Aaron Mueller
Tal Linzen
Alex Warstadt
Leshem Choshen
Chengxu Zhuang
Ryan Cotterell
Adina Williams

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Neural network language models can learn a surprising amount about language by predicting upcoming words in a corpus. Recent language technologies work has demonstrated that large performance improvements can arise from simply increasing ("scaling") the size of the data sets they are trained on (and, correspondingly, the number of parameters in those models); accordingly, many contemporary systems are trained on trillions of words. While largely beneficial to performance on language applications, scaling has several downsides for both computational psycholinguistics and natural language processing research. We discuss the scientific challenges presented by scaling, as well as the benefits that would result from human-scale language modeling research. In the second half of this paper, we report on takeaways from two efforts to bring about human-scale language model pretraining. First, we report on the first iteration of the BabyLM Challenge, a shared task organized by the authors that asked participants to train a language model on 100 million words or less. Second, we present experiments to answer open questions from the findings of the BabyLM Challenge: namely, are a significant amount of computational resources required to achieve high performance, even at such small scales? We find that high performance can be achieved at small data scales and with typical academic-scale computational resources.

Version published to 10.31234/osf.io/rfwgd_v2 on OSF Preprints
Apr 9, 2025
Version published to 10.31234/osf.io/rfwgd_v1 on OSF Preprints
Jul 17, 2024

Large Language Models in psycholinguistic studies

This article has 2 authors:
1. Fritz Guenther
2. Giovanni Cassani
This article has no evaluationsLatest version Apr 14, 2025
The neurocomputational mechanisms of hierarchical linguistic predictions during narrative comprehension

This article has 5 authors:
1. Faxin Zhou
2. Siyuan Zhou
3. Yuhang Long
4. Adeen Flinker
5. Chunming Lu
This article has no evaluationsLatest version Mar 29, 2025
Systematic languages are easier to learn: Evidence from artificial language learning (A paper written in 2011)

This article has 4 authors:
1. Barbora Skarabela
2. Monica Tamariz
3. Andrew D. M. Smith
4. Simon Kirby
This article has no evaluationsLatest version Mar 21, 2025

Listed in

Abstract

Article activity feed

Related articles

Large Language Models in psycholinguistic studies

The neurocomputational mechanisms of hierarchical linguistic predictions during narrative comprehension

Systematic languages are easier to learn: Evidence from artificial language learning (A paper written in 2011)