Modality-Specific and Amodal Language Processing by Single Neurons
Curation statements for this article:-
Curated by eLife
eLife Assessment
This study presents a large-scale characterization of single-neuron responses during reading and listening, enabling examination of both 'low-level' (orthographic/phonological) and 'higher-level' (syntactic) features, as well as links between single-neuron activity and multi-scale field potentials, making it a valuable resource for bridging micro- and macroscale accounts of language processing. The analyses identify modality-specific and putatively modality-independent responses across distributed brain regions, offering an intriguing framework for understanding how sensory-specific and abstract representations may relate. However, the evidence supporting the central claims is currently incomplete, due to limited population-level quantification, insufficient statistical characterization of how many neurons encode the relevant features, ambiguity in the interpretation of encoding model results, and a lack of rigorous tests of cross-modal generalization and alternative accounts, which together weaken the conclusions about amodal representations and hierarchical processing.
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (eLife)
Abstract
According to psycholinguistic theories, during language processing, spoken and written words are first encoded along independent phonological and orthographic dimensions, then enter into modality-independent syntactic and semantic codes. Non-invasive brain imaging has isolated several cortical regions putatively associated with those processing stages, but lacks the resolution to identify the corresponding neural codes. Here, we describe the firing responses of over 1000 neurons, and mesoscale field potentials from over 1400 microwires and 1500 iEEG contacts in 21 awake neurosurgical patients with implanted electrodes during written and spoken sentence comprehension. Using forward modeling of temporal receptive fields, we determined which sensory or abstract dimensions are encoded. We observed a double dissociation between superior temporal neurons sensitive to phonemes and phonological features and previously unreported ventral occipito-temporal neurons sensitive to letters and orthographic features. We also discovered novel neurons, primarily located in middle temporal and inferior frontal areas, which are modality-independent and show responsiveness to higher linguistic features. Overall, these findings show how language processing can be linked to neural dynamics, across multiple brain regions at various resolutions and down to the level of single neurons.
Article activity feed
-
eLife Assessment
This study presents a large-scale characterization of single-neuron responses during reading and listening, enabling examination of both 'low-level' (orthographic/phonological) and 'higher-level' (syntactic) features, as well as links between single-neuron activity and multi-scale field potentials, making it a valuable resource for bridging micro- and macroscale accounts of language processing. The analyses identify modality-specific and putatively modality-independent responses across distributed brain regions, offering an intriguing framework for understanding how sensory-specific and abstract representations may relate. However, the evidence supporting the central claims is currently incomplete, due to limited population-level quantification, insufficient statistical characterization of how many neurons encode the …
eLife Assessment
This study presents a large-scale characterization of single-neuron responses during reading and listening, enabling examination of both 'low-level' (orthographic/phonological) and 'higher-level' (syntactic) features, as well as links between single-neuron activity and multi-scale field potentials, making it a valuable resource for bridging micro- and macroscale accounts of language processing. The analyses identify modality-specific and putatively modality-independent responses across distributed brain regions, offering an intriguing framework for understanding how sensory-specific and abstract representations may relate. However, the evidence supporting the central claims is currently incomplete, due to limited population-level quantification, insufficient statistical characterization of how many neurons encode the relevant features, ambiguity in the interpretation of encoding model results, and a lack of rigorous tests of cross-modal generalization and alternative accounts, which together weaken the conclusions about amodal representations and hierarchical processing.
-
Reviewer #1 (Public review):
Summary:
This paper presents rare and unique recordings of single neurons, LFPs, and SEEG data from human patients performing reading and listening tasks. They identify single neurons in temporal and ventral occipito-temporal cortex that respond specifically to spoken and written language, and primarily encode either phonological or orthographic features of the stimuli. They also identify neurons in the middle temporal and inferior frontal cortex that respond to both modalities, which they interpret as amodal language responses. In general, neuronal population firing rates are correlated with both micro- and macro- scale broadband gamma responses, though they observe some dissociations, particularly with the macro-scale. The results are interpreted to support a model of modality-specific to amodal processing …
Reviewer #1 (Public review):
Summary:
This paper presents rare and unique recordings of single neurons, LFPs, and SEEG data from human patients performing reading and listening tasks. They identify single neurons in temporal and ventral occipito-temporal cortex that respond specifically to spoken and written language, and primarily encode either phonological or orthographic features of the stimuli. They also identify neurons in the middle temporal and inferior frontal cortex that respond to both modalities, which they interpret as amodal language responses. In general, neuronal population firing rates are correlated with both micro- and macro- scale broadband gamma responses, though they observe some dissociations, particularly with the macro-scale. The results are interpreted to support a model of modality-specific to amodal processing throughout many distributed brain areas for language.
Strengths:
(1) The data are truly unique, providing a large-scale characterization of single neuron responses from the human brain during written and spoken language processing.
(2) The task and stimulus conditions allow for examination of both low-level (e.g., orthographic/phonological) and higher-level (e.g., syntactic) encoding.
(3) Showing relationships between single neuron and multi-scale LFP recordings from the same sites helps bridge neuronal and meso/macroscale literatures.
Weaknesses:
(1) My main comment about the paper is that it feels like a collection of somewhat random descriptions of a very small number of hand-picked single neurons. I think that the task and stimulus design shown in Figure 1A sets up some clear hypotheses that could be tested rigorously across the full neuronal population, but instead, the authors pick a few neurons and fit encoding models that don't take advantage of the contrasts. I agree that encoding models are a powerful approach, but with only 508 total words and what appears to be a limited set of variability across the various features, it's not clear to me that the stimuli, which were apparently designed as minimal pairs, provide enough power to find robust results. Perhaps this is why the majority of the results only show a very small number of units (most of which are actually buried in the supplement), but it's odd to me that they don't show the results of the minimal contrasts other than for length.
(2) Related to point (1), other than Figure 2H and Figure 6A-B, the results are only shown for a tiny number of units. This is great for demonstrating qualitatively what the effects look like, but there is no quantification of the findings across the population, which undermines the point in the abstract that 1000 neurons were recorded. This is acknowledged in some places, but as a reader, it leaves me wondering how seriously to take the interpretations if they seemingly cannot be replicated. I understand this is a challenge with human single neuron recordings, but as presented, the paper as a whole comes across as largely anecdotal.
(3) Some of the key claims rest on the idea that neurons were recorded from the superior temporal gyrus and fusiform gyrus. For the STG claim, I don't understand how this was done, or what specifically they mean by STG, since the microwire locations do not appear to be anywhere near the lateral surface. This makes sense given the profile of the Behnke-Fried electrodes, but if they want to claim that there are neurons from the STG, they need to be more specific and show where precisely these wires are. If they are more medial as it appears, they need to explain how they dissociated STG from Heschl's gyrus. Similarly, for the fusiform neurons, I can only see a couple of probes that appear to have their tips near where I would think this area is. Perhaps this is more of a visualization issue with Figure 1F, but overall, I am not convinced that the neurons are exactly where they say they are.
(4) Related to point (3), some of the authors have made strong claims in prior work about the precise coordinates of the VWFA, so it would help to know how many units are within this exact region. The ROIs marked in Figure 2 are quite large, and given results like Vinckier et al. 2007, it's important to know where along the hierarchy the recordings were actually performed. Similarly, given the framing in the intro around the VWFA as a key area, the idea that some of the best example neurons are from the right fusiform is a bit confusing. I don't think they can make the claims about visual hemifields since it does not appear that they recorded eye tracking to verify constant central fixation, and it may be a bit surprising to see such strong orthographic selectivity in the right hemisphere (though, as a result, it may suggest a more nuanced view of lateralization of reading at the single neuron.
(5) In many sections of the paper, there are vague and unquantified claims like "many neurons" or "a large number of units". This needs to be made explicit. It would also help to show where statistical threshold cutoffs are on plots like Figure 2H, since the "brain-score" is used to select units for many analyses.
(6) More detail on the TRF models is needed in the methods. At the very least, a complete list of the features in each group is necessary to evaluate claims about very broad sets of features like "syntax". It would also help to know how the features were coded, especially where there is a mixture of continuous and discrete features within the model.
(7) Depending on how exactly the features were defined, I'm skeptical of some of the claims, like position-specific "w". There are some obvious confounds that need to be controlled here, like whether word-initial "w" is strongly associated with shorter, higher frequency words (like "wh-" words). There are other examples, like whether specific forked letters tend to appear in certain syllables in English words. While it may be the case that these kinds of patterns are uniformly distributed, it needs to be established in this particular stimulus set.
(8) The claim that there is monotonic encoding of word length does not seem strongly supported in the data. In both PC1 and the single neuron examples, it seems like there may be a non-linear relationship, which could suggest that another correlated feature (e.g., word frequency) is involved.
Minor Points:
(1) What are "boundaries"? They are not described anywhere I could find, but they are a feature group that was used in the TRFs. )
(2) The caption for Figure 6C says MTG and insula, but the text says MTG and IFG. Similar to the above comment about STG and fusiform, it's not clear to me how they achieved single-unit recordings with Behnke-Fried probes in these areas.
(3) The somewhat less robust correlations between firing rate and BGA in macro vs micro contacts are potentially interesting. However, did they verify that the closest macro contact was always in the gray matter of the same gyrus as the microwire?
-
Reviewer #2 (Public review):
Summary:
This manuscript, "Modality-Specific and Amodal Language Processing by Single Neurons," presents an intracranial electrophysiology study investigating how language is represented in the human brain across spoken and written modalities. The authors analyze activity from over one thousand single neurons and local field potentials recorded in twenty-one neurosurgical patients while participants read and listened to sentences. Using encoding models based on temporal receptive fields, they examine whether neural responses track modality-specific features, such as phonological and orthographic information, as well as higher-level linguistic features. The results are interpreted as evidence for a dissociation between modality-specific processing in sensory regions and modality-independent ("amodal") …
Reviewer #2 (Public review):
Summary:
This manuscript, "Modality-Specific and Amodal Language Processing by Single Neurons," presents an intracranial electrophysiology study investigating how language is represented in the human brain across spoken and written modalities. The authors analyze activity from over one thousand single neurons and local field potentials recorded in twenty-one neurosurgical patients while participants read and listened to sentences. Using encoding models based on temporal receptive fields, they examine whether neural responses track modality-specific features, such as phonological and orthographic information, as well as higher-level linguistic features. The results are interpreted as evidence for a dissociation between modality-specific processing in sensory regions and modality-independent ("amodal") representations in temporal and frontal cortices, supporting a two-stage model of language processing.
Strengths:
This study uses a rare and valuable dataset, combining single-neuron recordings with broader field potential measures in human participants. The large-scale recording, in terms of both neuron count and anatomical coverage across multiple regions and individuals, represents a significant technical achievement for intracranial research.
The use of encoding models to relate neural activity to multiple levels of linguistic representation is methodologically rigorous and provides a unified framework to compare phonological, orthographic, and higher-level features. This approach allows the authors to systematically test how different aspects of language are represented across neurons and regions.
Another key strength is the attempt to directly link concepts from Linguistics to neural data. By framing the results in terms of modality-specific versus amodal representations, the study engages with longstanding theoretical questions and offers a potential bridge between linguistic theory and systems neuroscience.
The manuscript is also very well written, and the data are presented clearly and effectively. The inclusion of raw data and raster plots is particularly valuable, as it allows readers to directly assess the neural responses and strengthens the transparency of the analyses.
Weaknesses:
Despite these strengths, the central claims of the paper are not fully supported by the analyses presented, and several key issues limit the strength of the conclusions.
A primary concern is the lack of clear reporting and statistical characterization of the proportion of neurons that significantly encode the tested linguistic features. While the paper presents illustrative examples and regional patterns of encoding, it does not systematically quantify how many neurons exhibit significant effects across conditions, nor does it provide formal statistical comparisons of these proportions across brain regions or feature types. As a result, it is difficult to determine whether the reported dissociations reflect robust population-level phenomena or relatively sparse subsets of neurons identified through model fitting. Figure 2H offers a visual depiction of the distribution of Brain-Score (a measure of model evaluation) across the fusiform gyrus and superior temporal gyrus, but it falls short of providing formal statistical testing or quantitative summaries, limiting its interpretability in supporting the authors' claims. Given that the authors employ temporal receptive field (TRF) analyses, the framework naturally allows for straightforward quantification of the proportion of neurons that significantly encode any linguistic features in the model, which could be reported by region as well as by stimulus condition (auditory vs. visual). Including such analyses would further strengthen the population-level interpretation of the results.
Relatedly, the interpretation of "amodal" neurons is not sufficiently substantiated. The classification of neurons as modality-independent relies on encoding model performance across conditions, but the statistical criteria for establishing cross-modal generalization are not always clearly defined or rigorously tested. Without explicit comparisons (e.g., testing whether the same neurons significantly encode features in both modalities above chance, and whether this exceeds what would be expected under appropriate null models), the claim of modality-independent representation remains somewhat underdetermined.
More generally, the reliance on encoding models introduces some interpretational ambiguity. Although the observed dissociation between fusiform and superior temporal regions is consistent with orthographic and phonological processing, respectively, the feature spaces used in the models are partially linked to lower-level sensory properties (e.g., visual form and acoustic features). The authors' single-neuron results suggest these effects reflect genuine linguistic selectivity, but the findings do not uniquely distinguish between linguistic and perceptual explanations. While fully disentangling these factors may be beyond the scope of the current study, the manuscript could benefit from a brief discussion acknowledging these correlations or clarifying how lower-level sensory contributions were considered.
Another limitation is that the proposed two-stage model of language processing is not directly tested against competing hypotheses. While the dissociation between modality-specific and amodal representations is consistent with this model, the authors note that higher-level features, such as syntax, may be encoded in a distributed or overlapping manner. These possibilities are not systematically tested, so the conclusions risk overinterpreting correlational patterns as evidence for a specific processing hierarchy. A more explicit discussion or quantitative consideration of these alternative accounts would strengthen the interpretation, while still allowing the two-stage model to be presented as a plausible framework.
-
Reviewer #3 (Public review):
Summary
This paper analyzes human single-neuron activity recorded with Behnke-Fried electrodes during naturalistic listening and reading. The authors demonstrate a double dissociation between superior temporal gyrus neurons (responsive during listening but not reading) and fusiform gyrus neurons (responsive during reading but not listening), and report that these two classes of neurons show selectivity to specific phonological and orthographic features of the stimulus, respectively. Across the language network, the authors also report neurons whose responses are amodal (active during both listening and reading), which they organize into a modal-to-amodal processing hierarchy. A separate thread of analyses tracks the relationship between single-neuron spiking, micro-wire, and macro-wire signals across these …
Reviewer #3 (Public review):
Summary
This paper analyzes human single-neuron activity recorded with Behnke-Fried electrodes during naturalistic listening and reading. The authors demonstrate a double dissociation between superior temporal gyrus neurons (responsive during listening but not reading) and fusiform gyrus neurons (responsive during reading but not listening), and report that these two classes of neurons show selectivity to specific phonological and orthographic features of the stimulus, respectively. Across the language network, the authors also report neurons whose responses are amodal (active during both listening and reading), which they organize into a modal-to-amodal processing hierarchy. A separate thread of analyses tracks the relationship between single-neuron spiking, micro-wire, and macro-wire signals across these regions. The authors interpret their findings as evidence for hierarchical processing across the language network and for a "compositional code" for orthography in reading.
Strengths
The dataset is rare and valuable. Simultaneous single-neuron, micro-wire, and macro-wire recordings during naturalistic reading and listening in the same patients are difficult to obtain, and the experimental design reflects substantial care. The cross-modality comparison at single-neuron resolution is a novel measurement, and the paper presents these results while also situating them against prior neuroimaging and intracranial work. The simultaneous availability of signals at three spatial scales within the human language network is an unusual and potentially important resource for the field.
Weaknesses
(1) Framing and novelty
The paper appropriately situates its modality-selectivity findings against prior neuroimaging and intracranial work (citing Buchweitz et al. 2009 among others) and frames its novel contribution as bringing single-neuron resolution to a question that has previously been examined at population scales. This framing is fair as far as it goes. However, two issues remain. First, the paper does not engage with neuroimaging evidence that complicates its clean modality-selectivity story - most notably Wilson, Bautista, & McCarron (2018), who found that the dorsal superior temporal sulcus is activated by both intelligible and unintelligible inputs in both modalities. Several reconciliations of single-neuron modality selectivity with population-level cross-modal activation are possible (sparse coding, BOLD-vs-spiking dissociations, etc.), and the paper should engage with these possibilities. Second, the paper's discussion extends well beyond the modality-selectivity result that is its headline contribution, into broader claims about a "compositional code" for orthography and "hierarchical processing" across the language network. These broader claims are not supported by the analyses presented (see Weakness 3), and their inclusion distracts from and weakens the core finding rather than building on it. The paper would be stronger if these claims were either subjected to the population-level analyses they require or scaled back to exploratory observations.
These framing issues are compounded by writing problems that obscure what the paper is claiming. Some passages, such as the assertion that the dataset "suggests an unprecedented examination of linguistic features across various brain regions at various resolutions," are not interpretable as written and should be rewritten.
(2) Methodological concerns about the TRF analyses
The selectivity findings in Figures 3 and 5 rest on temporal response function / temporal receptive field (TRF) analyses with several core issues.
2.1) First, the construction of the TRF feature stream for the reading condition is not specified in the methods. Reading stimuli are presented in RSVP, with all letters of a word appearing simultaneously. How letter or letter-position features are mapped to a time-varying regressor reflects a substantive hypothesis about the psychological mechanisms of reading, with statistical consequences for what the TRF can recover and how reading and listening analyses can be compared.
2.2) Second, the stimulus distribution limits which effects can be reliably estimated. While the design appears balanced for some features (e.g., subject gender and number), the features that drive the TRF analyses - particularly letter identity and position in the orthographic TRF - are unlikely to be well covered in a small stimulus set. This raises a concern about high-variance feature importance estimates.
2.3) Third, the TRF feature set includes syntactic, semantic, and discourse predictors alongside phonological and orthographic features. The paper does not justify this choice in fitting single-neuron responses in STG and FSG, and the consequences for the unique-variance analyses are not discussed. Because syntactic features are correlated with phonological and orthographic features in natural stimuli (function words are short, have characteristic phoneme distributions, and so on), the unique variance attributed to each feature set depends on what is being controlled for. Including syntactic predictors when fitting STG or FSG neurons also risks inflating overall TRF fit by chance, particularly in the absence of cross-neuron correction.
2.4) Fourth, there seems to be no correction for multiple comparisons across the neuron × feature grid. The within-neuron feature-importance procedure briefly described in the Figure 3 caption may help combat overestimates of feature importance within a single fit, but does not address the question of how many of the "selective" neurons reported across the paper would survive correction at the population level. With many neurons, many features, and a limited stimulus set, some neurons will appear selective to some features by chance alone, and these are likely to be the ones that appear as example panels in figures.
Together, these issues mean the per-feature selectivity results cannot be interpreted as the paper currently interprets them. This is consequential because the per-feature selectivity findings underpin the paper's broader claims about a compositional code for orthography and about hierarchical processing across feature levels.
(3) Claims that outrun the evidence
Several of the paper's broader claims are not supported by the analyses presented.
3.1) The authors claim a "compositional code" for orthography, in which single neurons code for the combination of letter identity and position. This claim is illustrated with two example neurons. A claim about a coding scheme is a population-level claim and requires a population-level analysis. A natural test would be a per-neuron model comparison between a TRF with letter identity alone and a TRF including letter identity × position interactions, controlled for model complexity, asking how many neurons show improved prediction with the interaction features. As noted above in {section sign}2.2, this analysis would also need to grapple with which letters and positions the data can support estimating. There is a potential connection to the data sparsity worries here: the n=2 example neurons may have the only selectivity profiles for which the relevant interactions could be estimated at all.
3.2) The "hierarchical processing" claim is motivated by neurons selective to features at multiple levels - graphemes and sub-graphemes in reading, single phonemes and diphthongs in listening. This claim is not specified mechanistically. The paper does not state what kind of structural linguistic hierarchy is intended (segmental phonology to syllabic structure?), what kind of hierarchical neurocomputational mechanism is being proposed, or why selectivity at multiple levels of a feature hierarchy is evidence for that mechanism rather than for any other mechanism (e.g., parallel feature detectors). As written, the claim is too underspecified to evaluate.
3.3) The "forked letters" finding (selectivity to k, v, w, y, z) is potentially confounded with letter frequency and co-occurrence structure. These letters are low-frequency, with some exhibiting strong positional asymmetries, and they infrequently co-occur with other letters. Under the unique-variance analysis, decorrelation from other features inflates apparent unique variance even in the absence of genuine selectivity.
3.4) The word-length effect in Figure 4 is established by PCA on the top five fusiform neurons, with no analysis showing the effect is qualitatively similar across a broader selection. Beyond establishing that something varies with word length, the paper makes no substantive claim about what the neural code represents - for instance, whether it reflects letter- or word-specific processing or a more general visual response to stimulus extent. Prior intracranial work has reported word-length effects in regions posterior to the VWFA but not within it (Thesen et al. 2012), raising the question of whether the effect reported here reflects letter-specific processing or a more general visual response that happens to correlate with stimulus extent.
(4) Missed opportunities
Several aspects of the paper are not so much wrong as underdeveloped, in ways that the authors are well-positioned to address.
4.1) The cross-scale comparison between single-neuron, micro-wire, and macro-wire signals is presented descriptively, without articulating what conclusion these analyses support about the relationship between scales of measurement. Given the rarity of simultaneous recordings at these scales, this is a substantial missed opportunity. The rasters in Figure 2 visually suggest a tight relationship between spiking and micro-population activity that is not evident in the summary in Figure 2g. This discrepancy is not explained. Characterizing the functional and temporal relationship linking spike rates to micro- and macro-HGA is a substantive scientific question, and the paper is well-positioned to address it.
4.2) The stimuli include controlled grammatical manipulations, but these manipulations are used as nuisance regressors in the TRF analyses rather than as the object of structured analysis. A design with controlled comparisons is being treated as if it were unconstrained naturalistic stimulation, which underuses the experimental structure the authors built.
4.3) Finally, the paper foregrounds the dataset as a contribution but does not describe data sharing plans. Given that several of this review's recommendations call for analyses the authors have not yet done, the long-term value of the dataset to the community will depend substantially on what is shared and how.
Buchweitz, A., Mason, R. A., Tomitch, L. M., & Just, M. A. (2009). Brain activation for reading and listening comprehension: An fMRI study of modality effects and individual differences in language comprehension. Psychology & neuroscience, 2(2), 111-123.
Jobard, G., Vigneau, M., Mazoyer, B., & Tzourio-Mazoyer, N. (2007). Impact of modality and linguistic complexity during reading and listening tasks. Neuroimage, 34(2), 784-800.
Thesen, T., McDonald, C. R., Carlson, C., Doyle, W., Cash, S., Sherfey, J., Felsovalyi, O., Girard, H., Barr, W., Devinsky, O., Kuzniecky, R., & Halgren, E. (2012). Sequential then interactive processing of letters and words in the left fusiform gyrus. Nature communications, 3, 1284.Wilson, S. M., Bautista, A., & McCarron, A. (2018). Convergence of spoken and written language processing in the superior temporal sulcus. Neuroimage, 171, 62-74.
-
Author response:
We thank the editors and reviewers for their constructive feedback on our manuscript. We accept the reviewers' recommendations and will implement them fully in our revised manuscript and include all of the suggested literature references. Below, we highlight several key points raised during the evaluation and outline exactly how we will address them. We will also explicitly address every other point and minor recommendation raised by the reviewers in our final, comprehensive point-by-point response.
Population-level quantification and statistical thresholds: The reviewers noted that our manuscript relied on single-neuron examples without fully demonstrating how widespread these patterns are across the recorded population. To address this, we will add population-level quantification across the recorded units using …
Author response:
We thank the editors and reviewers for their constructive feedback on our manuscript. We accept the reviewers' recommendations and will implement them fully in our revised manuscript and include all of the suggested literature references. Below, we highlight several key points raised during the evaluation and outline exactly how we will address them. We will also explicitly address every other point and minor recommendation raised by the reviewers in our final, comprehensive point-by-point response.
Population-level quantification and statistical thresholds: The reviewers noted that our manuscript relied on single-neuron examples without fully demonstrating how widespread these patterns are across the recorded population. To address this, we will add population-level quantification across the recorded units using standard False Discovery Rate (FDR) corrections for multiple comparisons. We will include summary tables in the text and add statistical threshold lines to the distribution figures to report the proportion of significant neurons per region.
Identifying amodal neurons: Reviewers raised concerns that our classification of amodal language neurons required a more direct test. We will provide additional measures of modality and, in particular, we will implement a cross-modal generalization analysis where our encoding models are trained on one modality (e.g., listening) and evaluated on the other (e.g., reading). This additional procedure will classify neurons as amodal if their cross-modal predictive performance exceeds a baseline null model.
Isolating linguistic features from sensory confounds: A point was raised regarding whether some neurons were tracking low-level sensory properties (like sound amplitude or visual text size) rather than language features. We will address this by running encoding analyses that include additional basic acoustic envelopes and visual baseline properties as control variables. This will allow us to evaluate the unique variance explained by linguistic features after accounting for these low-level sensory baselines.
Evaluating the "Compositional Code" in the Fusiform Gyrus: Reviewers pointed out that our claim regarding a "compositional code" (neurons tracking a combination of letter identity and position) was supported primarily by individual examples. To provide population-level context, we will perform a model comparison across our fusiform gyrus neurons. We will compare a baseline letter-only model against a model that includes letter-by-position interactions to report how many neurons statistically support this compositional structure.
TRF Feature and procedure explanation: Reviewers requested clarification on the construction of our TRF features. We will update the Methods section to explicitly detail how the features were constructed for both modalities. We will also include a feature correlation matrix in the Supplementary Materials. Furthermore, in order to contrast low-level possible confounds and high-level linguistic features, we will also conduct a control analysis tracking, e.g., specific affixes across different structural roles – for example, comparing how neurons respond to the phoneme /-s/ when it functions as a plural number marker versus when it appears as part of a lexical item (e.g., pass) or a third-person verb agreement. We will conduct such analyses in addition to fitting the main TRF models with these additional confounds included, ensuring a clear dissociation between high and low-level features.
-