Can computers understand words like humans do? Comparable semantic representation in neural and computer systems
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (eLife)
Abstract
Semantic representation has been studied independently in neuroscience and computer science. A deep understanding of human neural computations and the revolution to strong artificial intelligence appeal for a joint force in the language domain. We investigated comparable representational formats of lexical semantics between these two complex systems with fine temporal resolution neural recordings. We found semantic representations generated from computational models significantly correlated with EEG responses at an early stage of a typical semantic processing time window in a two-word semantic priming paradigm. Moreover, three representative computational models differentially predicted EEG responses along the dynamics of word processing. Our study provided a finer-grained understanding of the neural dynamics underlying semantic processing and developed an objective biomarker for assessing human-like computation in computational models. Our novel framework trailblazed a promising way to bridge across disciplines in the investigation of higher-order cognitive functions in human and artificial intelligence.
Article activity feed
-
Author Response:
We thank the editor and the reviewers for their feedback on our manuscript.
Our project aimed to join forces across neuroscience and computer science, advancing a finer-grained understanding of how lexical meanings are processed by human and artificial intelligence. As the reviewers correctly pointed out that in each research domain, enormous efforts have been made on investigating the proposed question. But these progresses, historically, have been developed independently in the domains of cognitive neuroscience and artificial intelligence in computer science. As in the current stage of research, the necessity for integrating these two lines of research is more urgent than ever before. However, bridging two research domains is a completely different ball game that requires novel theoretical framework and innovative …
Author Response:
We thank the editor and the reviewers for their feedback on our manuscript.
Our project aimed to join forces across neuroscience and computer science, advancing a finer-grained understanding of how lexical meanings are processed by human and artificial intelligence. As the reviewers correctly pointed out that in each research domain, enormous efforts have been made on investigating the proposed question. But these progresses, historically, have been developed independently in the domains of cognitive neuroscience and artificial intelligence in computer science. As in the current stage of research, the necessity for integrating these two lines of research is more urgent than ever before. However, bridging two research domains is a completely different ball game that requires novel theoretical framework and innovative experimentations and database.
The current stage of artificial intelligence is statistical mapping between inputs and outputs by nature, without any true intellectual processing involved (Yann LeCun). To bridge two complex systems (e.g., the human brain and computers), the first step is to find a common ground for representing information. For example, in the domain of vision, joint forces between computer science and neuroscience have recently established mappings between features in different layers of deep neural network models and neural representations in visual hierarchies. However, in another important domain of artificial intelligence – natural language processing (NLP), advances are still scarce, because fine-grained understanding of both the dynamics of brain responses and the underlying mechanisms of NLP models is yet to be established. In this study, we proposed a novel research framework that investigates the possible common lexical-semantic representation in the human brain and computers, which serves the first and fundamental step to bridging these two research domains.
Experimentally, we optimized the classic lexical-semantic paradigm as well as developed novel research methods to investigate the common representations between the brain and computers. Specifically, in this project, we used a two-word semantic priming paradigm with electroencephalography (EEG) recordings to quantify the dynamic processing of human language comprehension in a most basic setting. We then evaluated three computational models by correlating neural data with model-generated semantic similarity scores for the same word pairs, with a novel single-trial EEG correlation analysis. We agree with the reviewers that this study have many places that can be improved – just like all studies that aim to open a new research direction. To our knowledge, this is the first attempt to create a natural, dynamic, neural dataset for evaluating computational models in the linguistic domain, thus paving a new way towards a full understanding of the general computational mechanisms of language processing across complex systems.
-
Summary: There was general enthusiasm for exploring approaches to semantic relationships in language, and for the quantitative comparison of different modeling approaches. There were questions on the degree to which the current results tied in to past literature of semantic processing, which seemed like it could have been better integrated, to help make current advances in theory more clear. As one example, the overall framing to try to link computational models and neural processing seemed to be a stretch given the data.
Reviewer #1:
In this paper the authors examine neural representations of semantic information based on EEG recordings on 25 subjects on a two-word priming paradigm. The overall topic of how meaning is represented in the brain, and particularly the effort to understand this on a rapid timescale, is an important one. …
Summary: There was general enthusiasm for exploring approaches to semantic relationships in language, and for the quantitative comparison of different modeling approaches. There were questions on the degree to which the current results tied in to past literature of semantic processing, which seemed like it could have been better integrated, to help make current advances in theory more clear. As one example, the overall framing to try to link computational models and neural processing seemed to be a stretch given the data.
Reviewer #1:
In this paper the authors examine neural representations of semantic information based on EEG recordings on 25 subjects on a two-word priming paradigm. The overall topic of how meaning is represented in the brain, and particularly the effort to understand this on a rapid timescale, is an important one. Although presented thoroughly, the analyses did not make a convincing step beyond prior investigations in linking semantic models to neuroscientific theories of meaning representation.
Linking word embedding / high dimensional semantic spaces to brain data has been done before in both fMRI and M/EEG (some of these papers are cited here). That is, the potential to link these two types of data has been demonstrated. So, an important question is what key advance to the current data does this provide. This seems like it could be either a deep dive into the representational spaces of the language models, or using the models to advance our understanding of semantic representation in the brain. Unfortunately I was not convinced that either of these was realized.
One important contribution seems to be the use of three word embedding models (i.e. three semantic spaces): CBOW, GloVe, and CWE. Although these are described briefly (L89 and following) the nature of the different predictions was not spelled out, and thus the different (contradictory or complementary) aspects of these models were not immediately clear. In other words, by the end of the paper it wasn't clear whether we learned anything about these models we didn't know before.
The relationship of the reported ERP findings to contemporary views of semantic memory was lacking. There is a large literature on semantic memory that goes far beyond the N400. I don't mean to imply that the authors need to address ALL of it, but right now it is difficult to get even a sanity check on whether the topographic/neuroanatomical distributions for the models are reasonable. This difficulty also leads to some questions with the methods - for example, averaging model-brain correlations across all channels. Given that some channels are likely to be more informative than others, I'm not convinced the overall average is a good metric. All told a greater link between the language models and neural responses is needed (i.e. a clearer link to frameworks for semantic memory).
Reviewer #2:
Summary and General Assessment:
25 participants performed a visual primed lexical decision task while EEG was recorded. The authors correlate the EEG-recorded neural activity with three different methods of deriving word embedding vectors. The goal was to investigate semantic processing in the brain, using metrics that have been derived using NLP tools. The main finding is that neural activity during the same time-window (~200-300 ms) that has been associated with semantic processing in classic EEG literature - the so-called N400 component - was significantly modulated by semantic similarity between the prime and target pairs as quantified by the word embeddings. The authors claim, therefore, that brains and machines have similar representations of semantics in their processing.
My main concern, highlighted below, is that the claims exceed the findings of the paper. I believe that the current results nicely recapitulate the classic N400 literature using a continuous variable rather than a categorical design, but do not significantly contribute to our understanding of semantic processing in AI and humans.
Major comments:
- Magnitude of claims
My main concern is that the authors are claiming interpretations that are much broader than the experimental design and results can support. The experimental design adapts a classic lexical-decision priming paradigm, using the cosine-similarity in the word-embeddings as the index of semantic similarity between prime and target. They replicate an N400 result using this continuous measure rather than a categorical one. While this is interesting, it does not, in my view, contribute to the discussion of the similarity between brains and AI. Instead, it demonstrates that co-occurrence metrics can be used as proxies for semantic similarity between word pairs.
- Analytic rigor
I also have my concerns regarding the analysis techniques selected. The authors primarily analyse activity as recorded from the single electrode, or average the data across all electrodes. The results across electrodes are just shown for visualisation purposes with no statistics. I would suggest instead applying a spatio-temporal permutation test to incorporate the spatial dimension.
Relatedly, even though justification is given for primarily analysing data recorded from channel Cz based on previous N400 studies, it seems that a lot of the analyses are actually applied on Oz (e.g. line 288, and in Figure 4 caption). Is this a typo, or was the analysis indeed applied to Oz?
The duration of the effects using the temporal cluster test are very short, in some cases less than 10 ms in duration. A priori, we would expect meaningful measures of semantic processing to be of a much longer duration.
- Completeness of description of analysis
I found the reporting of the statistical results very much under-specified. Although behavioural analyses are sufficiently reported, EEG-analyses are not. I found no report of effect sizes, and specific p-values were missing in many cases.
Reviewer #3:
The study analyzed EEG responses to visually presented noun-noun pairs. Priming effects were estimated by subtracting the response to the same noun presented in prime position from the response in target position. These priming effects were then correlated with the cosine distance computed from 3 variations on a word embedding model.
Semantic distances from word embedding models have been previously shown to predict brain responses (papers cited on line 74, but also work by Stefan Frank, e.g. Frank & Willems, 2017; Frank & Yang, 2018). The main text argues that previous studies, which used whole sentence stimuli, confound semantic composition with semantic representations, and that the innovation of the present study is that it uses a semantic priming paradigm to access "pure" (79) semantic representations.
My main concern is that the conclusions are not supported by the data (point 1 below). I also have some concerns about the methods. In my view the data and analysis approach could potentially be interesting, but the framing would need to be quite different to emphasize conclusions that are appropriate for the evidence (and probably more modest).
- Interpretation of the results
The main claim of the manuscript is that the correlations imply "Comparable semantic representation in neural and computer systems" (title), repeated as "common semantic representations between [the] two complex systems" (300 ff.) and "human-like computation in computational models" (13). This conclusion is not warranted by the results. The word embedding models are essentially (by design) statistical co-occurrence models. It has also long been known that humans, and N400s specifically are sensitive to language statistics (e.g., Kutas & Federmeier, 2011). The correlation is thus parsimoniously explained by the fact that both systems are sensitive to lexical co-occurrence statistics. The (implicit) null hypothesis that is rejected is merely that human responses are insensitive to these co-occurrence patterns at all. The alternative hypothesis does not by itself imply any deeper similarity in the representational format. Similarly, the comparison of correlations with different word embedding models can potentially tell us something about which specific co-occurrence patterns humans are sensitive to, but it does not by itself imply any deeper similarity of the representations.
- Methods
The Methods section leaves open several crucial questions.
2-A) Data was recorded from multiple subjects. However, the dependent variable was a correlation coefficient between single-trial ERP and trial-wise semantic dissimilarity. How did this model account for the multi-level structure of the data?
2-B) It is not clear that the results are corrected for multiple comparison across the 600 time points. The threshold for significance in Figure 4 B varies for each time point, whereas a critical feature of classical permutation tests is to aggregate the maximum statistic across the time points to correct for multiple comparison. The legend also indicates that the test was performed "at each time point" (4) without mentioning correction.
2-C) The statistical analysis is even less clear when different models are compared (309 ff.). For a significant result, a p-value should be provided and, if possible, some estimate of effect size.
References
Frank, S. L., & Willems, R. M. (2017). Word predictability and semantic similarity show distinct patterns of brain activity during language comprehension. Language, Cognition and Neuroscience, 32(9), 1192-1203. https://doi.org/10.1080/23273798.2017.1323109
Frank, S. L., & Yang, J. (2018). Lexical representation explains cortical entrainment during speech comprehension. PLOS ONE, 13(5), e0197304. https://doi.org/10.1371/journal.pone.0197304
-