Illusions of Alignment Between Large Language Models and Brains Emerge From Fragile Methods and Overlooked Confounds
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Emerging research seeks to draw neuroscientific insights from the neural predictivity of large language models (LLMs). However, as results continue to be generated at a rapid pace, there is a growing need for large-scale assessments of their robustness. Here, we analyze a wide range of models, methodological approaches, and neural datasets. We find that some methodological approaches, particularly the use of shuffled train-test splits, have led to many impactful yet unreliable findings, and that the method by which activations are extracted from LLMs can bias results to favor particular model classes. Moreover, we find that confounding variables, particularly positional signals and word rate, perform competitively with trained LLMs and fully account for the neural predictivity of untrained LLMs. In summary, our results suggest that theoretically interesting connections between LLMs and brains on three neural datasets are driven largely by fragile methodologies and overlooked confounds.