Bigger is not always better: The importance of human-scale language modeling for psycholinguistics

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Neural network language models can learn a surprising amount about language by predicting upcoming words in a corpus. Recent language technologies work has demonstrated that large performance improvements can arise from simply increasing ("scaling") the size of the data sets they are trained on (and, correspondingly, the number of parameters in those models); accordingly, many contemporary systems are trained on trillions of words. While largely beneficial to performance on language applications, scaling has several downsides for both computational psycholinguistics and natural language processing research. We discuss the scientific challenges presented by scaling, as well as the benefits that would result from human-scale language modeling research. In the second half of this paper, we report on takeaways from two efforts to bring about human-scale language model pretraining. First, we report on the first iteration of the BabyLM Challenge, a shared task organized by the authors that asked participants to train a language model on 100 million words or less. Second, we present experiments to answer open questions from the findings of the BabyLM Challenge: namely, are a significant amount of computational resources required to achieve high performance, even at such small scales? We find that high performance can be achieved at small data scales and with typical academic-scale computational resources.

Article activity feed