Anticipation of temporally structured events in the brain

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The study addresses a topic that is timely and of general interest. The findings represent a potentially very interesting contribution to the important question of how the brain comes to predict the future, in particular lifelike sequences of events. However, some of the main conclusions would require further statistical support.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Learning about temporal structure is adaptive because it enables the generation of expectations. We examined how the brain uses experience in structured environments to anticipate upcoming events. During fMRI (functional magnetic resonance imaging), individuals watched a 90 s movie clip six times. Using a hidden Markov model applied to searchlights across the whole brain, we identified temporal shifts between activity patterns evoked by the first vs. repeated viewings of the movie clip. In many regions throughout the cortex, neural activity patterns for repeated viewings shifted to precede those of initial viewing by up to 15 s. This anticipation varied hierarchically in a posterior (less anticipation) to anterior (more anticipation) fashion. We also identified specific regions in which the timing of the brain’s event boundaries was related to those of human-labeled event boundaries, with the timing of this relationship shifting on repeated viewings. With repeated viewing, the brain’s event boundaries came to precede human-annotated boundaries by 1–4 s on average. Together, these results demonstrate a hierarchy of anticipatory signals in the human brain and link them to subjective experiences of events.

Article activity feed

  1. Author Response:

    Reviewer #1:

    In this study, Lee et al. reanalyzed a previous fMRI dataset (Aly et al., 2018) in which participants watched the same 90s movie segment six times. Using event-segmentation methods similar to Baldassano et al. (2017), they show that event boundaries shifted for the average of the last 5 viewings as compared to the first viewing, in some regions by as much as 12 seconds. Results provide evidence for anticipatory neural activity, with apparent differences across brain regions in the timescale of this anticipation, in line with previous reports of a hierarchy of temporal integration windows.

    – One of the key findings of the paper – long-timescale anticipatory event reinstatement – overlaps with the findings of Baldassano et al., 2017. However, the previous study could not address the multiple time scales/hierarchy of predictions. Considering that this is the novel contribution of the current study, more statistical evidence for this hierarchy should be provided.

    We agree that more statistical evidence for the hierarchy is critical. As noted above and described in more detail below, we did this in two ways. First, we related anticipation amounts to the position of brain regions along the anterior/posterior axis, and indeed found that anticipation significantly increases as one moves more anteriorly in the brain. Second, we explored whether brain regions with faster vs slower activity dynamics (i.e., more vs fewer events during the movie) showed differences in anticipation amounts. We found that regions that integrate information over more of the past (show fewer, longer events) show significantly more anticipation into the future.

    – The current hierarchy of anticipation is closely linked to (and motivated by) previous studies showing evidence of a hierarchy of temporal integration windows. Indeed, the question of the study was "whether this hierarchy also exists in a prospective direction". This question is currently addressed somewhat indirectly, by displaying above-threshold brain regions, but without directly relating this hierarchy to previous findings of temporal integration windows, and without directly testing the claimed "posterior (less anticipation) to anterior (more anticipation) fashion" (from abstract).

    Thank you for this important suggestion. We tested whether there is a hierarchy in the posterior (less anticipation) to anterior (more anticipation) direction by calculating the Spearman correlation between the Y-coordinate of each significant voxel (indexing how posterior vs anterior that voxel is) and the amount of anticipation in that voxel. We compared this correlation value to correlations between the Y-coordinate and the amount of anticipation in null maps produced by randomly permuting the order of the viewings. We observed a Spearman rho of 0.58 for the anterior/posterior axis (p = 0.0030). This relationship persisted when the analysis was done on the unthresholded anticipation map (Spearman’s rho = 0.42, p = 0.0028; Supplementary Figure 1). Furthermore, there were no significant relationships between anticipation and the left-to-right (X) axis or the inferior-to-superior (Z) axis. We now describe this as follows in the paper:

    In Methods:

    “To determine if anticipation systematically varied across the cortex in the hypothesized posterior-to-anterior direction, we calculated the Spearman correlation between the Y-coordinate of each significant (q < 0.05) voxel (indexing the position of that voxel along the anterior/posterior axis) and the mean amount of anticipation in that voxel. To obtain a p-value, the observed correlation was compared to a null distribution in which the Spearman correlation was computed with the null anticipation values from the permutation analysis described above, in which the order of the viewings was randomly scrambled for each participant. For comparison, the correlation was also computed for the X (left-right) and Z (inferior-superior) axes. This analysis was repeated on unthresholded anticipation maps, to examine if this hierarchy remained even when including regions whose anticipation amounts did not reach statistical significance.” (p.13)

    In the Results:

    “The magnitude of this shift varied along a posterior to anterior temporal hierarchy (Spearman’s rho = 0.58, p = 0.0030), with the most anterior regions in the temporal pole and prefrontal cortex showing shifts of up to 15 seconds on subsequent viewings compared to the first viewing. This hierarchy persisted even when computed on the unthresholded anticipation map including voxels that did not meet the threshold for statistical significance (Spearman’s rho = 0.42, p = 0.0028; see Supplementary Figure 1). There were no significant correlations with the left-to-right axis (rho = 0.06, p = 0.41 for thresholded map; rho = 0.12, p = 0.29 for unthresholded map) or the inferior-to-superior axis (rho = 0.07, p = 0.28 for thresholded map; rho = -0.11, p = 0.73 for unthresholded map). We obtained a similar map when comparing the first viewing to just the sixth viewing alone (see Supplementary Figure 2).” (p.4)

    We also complemented this approach by looking at whether anticipation amounts vary systematically as a function of the optimal event timescale for a brain region. We first found the optimal number of HMM events for a given brain region based on the first viewing of the movie clip. Regions with fewer events show slower timescales of processing than those with more events, and based on prior studies are known to integrate information over more of the past (Hasson et al., 2008; Hasson et al., 2015; Lerner et al., 2011). We then looked at anticipation within each timescale bin. This was compared to a null distribution where timescale values were correlated with permuted anticipation maps, for which repetition order was scrambled. We found that anticipation is further reaching for regions with longer timescales, as we hypothesized (Spearman rho = 0.319, p = 0.00031; Supplementary Figure 3).

    These new analyses have been incorporated into the Methods and Results as follows:

    “To relate the timescales of anticipation to the intrinsic timescales of brain regions during the first viewing, we fit the HMM on the first viewing alone, varying the number of events from 2 to 10. The HMM was trained on the average response from half of the participants (fitting the sequence of activity patterns for the events and the event variance) and the log-likelihood of the model was then measured on the average response in the other half of the participants. The training and testing sets were then swapped, and the log-likehoods from both directions were averaged together. Hyperalignment was not used during this fitting process, to ensure that the training and testing sets remained independent. The number of events that yielded the largest log-likelihood was identified as the optimal number of events for that searchlight. The optimal number of events was then compared to the anticipation timescale in that region (from the main analysis), using Spearman correlation” (p.14)

    “We also compared how this hierarchy of anticipation timescales related to the intrinsic processing timescales in each region during the initial viewing of the movie clip. Identifying the optimal number of HMM events for each searchlight, we observed a timescale hierarchy similar to that described in previous work, with faster timescales in sensory regions and slower timescales in more anterior regions (Supplementary Figure 3a). Regions with longer intrinsic timescales also showed a greater degree of anticipation with repeated viewing (Supplementary Figure 3b).” (p.4)

    – The analysis is based on averaging the data of the 5 repeated viewings and comparing this average with the data of the first viewing. This means that the repeated viewing condition had much more reliable data than the initial viewing condition. This could potentially affect the results (e.g. better fit to HMM). To avoid this bias, the 5 repeated viewings could be entered separately into the analysis (e.g., each separately compared to the first viewing) and results averaged at the end. Alternatively, only the 6th viewing could be compared to the first viewing (as in Aly et al., 2018).

    Thank you for this suggestion, which we have implemented. Rather than averaging the timescourses from the repeated viewings, we fit the HMM jointly to data from all six viewings. This joint fit constrained the event patterns to be the same across viewings, but allowed the timing of these patterns to vary freely across viewings. We then averaged the anticipation results (from the time by events plots) across viewings 2-6, as suggested. The same pattern of results was observed, and this is now the main analysis in the paper (Figure 2). We also compared the first viewing to the last viewing, as suggested. As shown in Supplementary Figure 2, this analysis also showed a similar pattern of results.

    – Correlation analysis (Fig 6). "we tested whether these correlations were significantly positive for initial viewing and/or repeated viewing, and whether there was a significant shift in correlation between these conditions". It was not clear to me how we should interpret the correlation results in Figure 6. Might a lower correlation for repeated viewing not also reflect general suppression (e.g. participants no longer paying attention to the movie)? Perhaps comparing the correlations at the optimal lag (for each cluster) might help to reduce this concern; that is, the correlation difference would only exist at lag-0.

    We agree that a lower correlation for repeated vs. initial viewing could reflect cognitive processes unrelated to anticipation. Thus, the drop in correlation at lag 0 is not as important or meaningful as a shift in the peak correlation with multiple viewings. In particular, the peak correlation value might be the same for first vs repeated viewings, but a shift in the timing of that peak correlation would support our hypothesis of anticipation.

    We addressed this issue above, under Essential Revision #3, but also include our response below for convenience. We conducted a new analysis in which we measured the timing of the peak cross-correlation between HMM-derived event transitions in the brain and the human-annotated event boundaries, separately for each of the six movie viewings. In other words, we found the amount of shift in the brain’s event transitions that led to the maximum correlation with the timing of the human-annotated event boundaries. We then compared the timing of the correlation peak for the first movie viewing to the timing of the mean peak across viewings 2-6, and found regions of the brain where the peak shifted to be earlier with subsequent movie viewings. This was done as a whole-brain analysis with FDR correction. We include a figure (Figure 5) showing the data for the three searchlights that corresponded to clusters that met the q < .05 FDR criterion. The preceding analysis looked for regions for which the timing of the peak cross-correlation between the brain’s events and human-annotated events shifted earlier over movie repetitions, but did not test for the absolute location of that peak correlation (relative to zero lag between the HMM events and annotated events). Do the brain’s event transitions occur before annotated event transitions, after, or are they aligned? And how does this change over movie repetitions? We examined this question in the three clusters that emerged from the analysis in the preceding paragraph. We found that for the initial viewing, the brain’s event transitions lagged behind human-annotated event boundaries for two of the three clusters, whereas for the last cluster, the brain’s transitions and subjective event boundaries were aligned. For repeated viewings, the timing of the peak correlations shifted such that the brain’s representations of an event transition reliably preceded the occurrence of the human-annotated event boundary, for all three clusters (Figure 5).

    – Correlation analysis (Figure 6). "For both of these regions the initial viewing data exhibits transitions near the annotated boundaries, while transitions in repeated viewing data occur earlier than the annotated transitions" How was this temporal shift statistically assessed?

    The reviewer rightly noted that we did not statistically assess this shift in the first submission; that assessment was based on visual inspection. We now statistically assess whether the relationship between human-annotated event boundaries and the brain’s event transitions shifts with movie repetitions (see response above and Figure 5). We also test whether the brain’s event boundaries reliably occur before, after, or aligned with the human-annotated event boundaries. To that end, we first found the timing of the peak cross-correlation between the brain’s event boundaries and human-annotated event boundaries for each of the three clusters that emerged from the preceding analysis, separately for initial vs repeated viewings. We then obtained confidence intervals for the timing of those peaks by bootstrapping across participants who did the event annotations. In particular, we obtained the timing of the peak cross-correlation between the brain’s event transitions and each of the bootstrapped human-annotated event transitions, and used the bootstrapped timing distribution to find the upper and lower bounds of a 95% confidence interval (measured in seconds).

    We found that, for the initial movie viewing, two of three clusters had event transitions that occurred after subjective event boundaries (95% CIs for Fusiform Gyrus = [0.14, 1.99]; Superior Temporal Sulcus = [1.48, 8.53]). The last cluster had a peak correlation with event boundaries in the movie that was not different from a lag of 0 (i.e., the brain’s event transitions and the human-annotated event boundaries were aligned (95% CIs for Middle Temporal Gyrus = [-0.27, 2.86]). For the repeated movie viewings, this relationship shifted such that, for all three clusters, the brain’s event transitions reliably preceded event boundaries in the movie (95% CIs for Fusiform Gyrus = [-1.56, -0.26], Superior Temporal Sulcus = [-3.06, -1.69], Middle Temporal Gyrus = [-4.06, -1.83]). These shifts are largely consistent with mean anticipation amounts in each of these clusters (Fusiform Gyrus = 2.32s; Superior Temporal Sulcus = 2.18s; Middle Temporal Gyrus = 1.18s).

    This updated analysis is described in Methods and Results as follows:

    “We compared the event boundaries identified by the HMM within each searchlight to the event boundaries annotated by human observers. To obtain an event boundary timecourse from the annotations, we convolved the number of annotations (across all raters) at each second with the hemodynamic response function (HRF) (Figure 4). Separately, we generated a continuous measure of HMM "boundary-ness" at each timepoint by taking the derivative of the expected value of the event assignment for each timepoint, as illustrated in Figure 1d. Moments with high boundary strength indicate moments in which the brain pattern was rapidly switching between event patterns. We cross-correlated the HMM boundary strength timecourse for each viewing with the annotated event boundary timecourse, shifting the annotated timecourse forward and backward to determine the optimal temporal offset (with the highest correlation). We measured the timing of the peak correlation by identifying the local maximum in correlation closest to zero lag, then fitting a quadratic function to the maximum-correlation lag and its two neighboring lags and recording the location of the peak of this quadratic fit. This produced a continuous estimate of the optimal lag for each viewing. We measured the amount of shift between the optimal lag for the first viewing and the average of the optimal lags for repeated viewings, and obtained a p value by comparing to the null distribution over maps with permuted viewing orders (as in the main analysis), then performed an FDR correction.

    We identified three gray-matter clusters significant at q < 0.05. To statistically assess whether the optimal lags differed from zero in the three searchlights maximally overlapping these three clusters, we repeated the cross-correlation analysis in 100 bootstrap samples, in which we resampled from the raters who generated the annotated event boundaries. We obtained 95% bootstrap confidence intervals for maximally-correlated lag on the first viewing and for the average of the maximally-correlated lags on repeated viewings.” (p.14)

    “We asked human raters to identify event transitions in the stimulus, labeling each ”meaningful segment” of activity (Figure 3). To generate a hypothesis about the strength and timing of event shifts in the fMRI data, we convolved the distribution of boundary annotations with a Hemodynamic Response Function (HRF) as shown in Figure 4. We then explored alignment between these human-annotated event boundaries and the event boundaries extracted from the brain response to each viewing, as shown in Figure 1d. In each searchlight, we cross-correlated the brain-derived boundary timecourse with the event annotation timecourse to find the temporal offset that maximized this correlation. We found three clusters in the Middle Temporal Gyrus (MTG), Fusiform Gyrus (FG), and Superior Temporal Sulcus (STS) in which the optimal lag for the repeated viewings was significantly earlier than for the initial viewing, indicating that the relationship between the brain-derived HMM event boundaries and the human-annotated boundaries was changing with repeated viewings (Figure 5). The HMM boundaries on the first viewing were significantly later than the annotated boundaries in FG and STS, while the optimal lag did not significantly differ from zero in MTG (95% confidence intervals for the optimal lag, in seconds: MTG = [-0.27, 2.86]; FG = [0.14, 1.99]; STS = [1.48, 8.53]). The HMM boundaries on repeated viewings were significantly earlier than the annotated boundaries in all three regions (95% confidence intervals for the average optimal lag, in seconds: MTG = [-4.06, -1.83]; FG = [-1.56, -0.26]; STS = [-3.06, -1.69]).” (p.6-7)

    – Not all clusters in Figure 2/6 look like contiguous and meaningful clusters. For example, cluster 9 appears to include insula as well as (primary?) sensorimotor cortex, and cluster 4 includes both ventral temporal cortex and inferior parietal cortex/TPJ. It is thus not clear what we can conclude from this analysis about specific brain regions. For example, the strongest r-diff is in cluster 4, but this cluster includes a very diverse set of regions.

    We agree with this assessment. Because dividing up large clusters would have to be done somewhat arbitrarily, we opted to remove the table that implied the existence of functionally homogeneous clusters. Instead, we will publicly share the unthresholded anticipation map on NeuroVault at https://identifiers.org/neurovault.collection:9584 in case others are interested in it for meta-analyses or comparison to their own work.

    Furthermore, our new analyses systematically compare anticipation across the cortical hierarchy and across regions with different event timescales. Those analyses allow us to say that anticipation varies from posterior to anterior regions of the brain, and that regions with longer event timescales also show further-reaching anticipation. We therefore believe that the current work offers important conclusions about how anticipation varies across the brain, rather than conclusions about any specific brain region’s role in anticipation or why that role arises.

    – In previous related work, the authors correlated time courses within and across participants, providing evidence for temporal integration windows. For example, in Aly et al., 2018 (same dataset), the authors correlated time courses across repeated viewings of the movie. Here, one could similarly correlate time courses across repeated viewings, shifting this time course in multiple steps and testing for the optimal lag. This would seem a more direct (and possibly more powerful) test of anticipation and would link the results more closely to the results of the previous study. If this analysis is not possible to reveal the anticipation revealed here, please motivate why the event segmentation is crucial for revealing the current findings.

    Thank you for bringing this up! This is indeed a simpler way to look for anticipation, but it is arguably less sensitive than the HMM approach. This is because the shifting analysis assumes that anticipation is relatively constant throughout the movie clip. For example, if one shifts the timecourses by 2s relative to one another, the assumption is that brain activity dynamics for repeated viewings will precede those for the initial viewing by 2s throughout the entire clip. Furthermore, one needs to systematically test multiple possible shifts in each brain region to identify the best-fitting amount of anticipation. We therefore opted to use the HMM approach because it does not assume constant anticipation amounts throughout the duration of the clip, but instead allows the amount of anticipation to vary dynamically. Furthermore, the HMM can naturally uncover different timescales of anticipation in different brain regions, without needing a priori hypotheses about what the extent of the shift is.

    Nevertheless, we took the reviewer’s advice and used this cross-correlation approach to examine our data. We ran a searchlight approach to find the peak in the cross-correlation between activity dynamics within each searchlight for the first vs repeated viewings of the movie clip. We found a number of regions where the activity dynamics shifted earlier across movie repetitions, with the amount of this shift varying across regions. Interestingly, the regions that passed statistical significance were primarily in frontal and parietal regions and were far less extensive than those revealed by the HMM analyses. Thus, the HMM approach did seem to be more sensitive and better able to detect anticipation, especially in more posterior parts of the brain with more subtle anticipatory shifts.

    We now include this analysis as Supplementary Figure 4, and briefly discuss it as follows:

    In the Methods:

    “For comparison, we also ran a searchlight looking for anticipatory effects using a non-HMM cross-correlation approach. Within each searchlight, we obtained an average timecourse across all voxels, and correlated the response to the first viewing with the average response to repeated viewings at differing lags. Using the same quadratic-fit approach for identifying the optimal lag described below, we tested whether the repeated-viewing timecourse was significantly ahead of the initial-viewing timecourse (relative to a null distribution in which the viewing order was shuffled within each subject). The p values obtained were then corrected for False Discovery Rate.” (p.14)

    In the Results:

    "We also compared these results to those obtained by using a simple cross-correlation approach, testing for a fixed temporal offset between the responses to initial and repeated viewing. This approach did detect significant anticipation in some anterior regions, but was much less sensitive than the more flexible HMM fits, especially in posterior regions (Supplementary Figure 4)." (p.4)

    Reviewer #2:

    Aly et al. investigated anticipatory signals in the cortex by analysing data in which participants repeatedly watched the same movie clip. The authors identified events using an HMM-based data-driven event segmentation method and examined how the timing of events shifted between the initial and repeated presentation of the same video clip. A number of brain regions were identified in which event timings were shifter earlier in time due to repeated viewing. The main findings is that more anterior brain regions showed more anticipation than posterior brain regions. The reported findings are very interesting, the approach the authors used is innovative and the main conclusions are supported by the results and analyses. However, many cortical regions did not show any anticipatory effects and it is not clear why that is. In part, this may be due to a number of suboptimal aspects in the analysis approach. In addition, the analyses of behavioural annotations are open to multiple interpretations.

    Methods and Results:

    1. The paper shows that across multiple regions in the cortex, there is significant evidence for anticipation of events with repeated viewing. However, there are also many areas that do not show evidence for anticipation. It is not clear whether this is due to a lack of anticipation in those areas, or due to noise in the data or low power in the analyses. There are two factors that may be causing this issue. First, the data that were used are not optimal, given the short movie clip and relatively low number of participants. Second, there are a number of important issues with the analyses that may have introduced noise in the observed neural event boundaries (see points 2-4 below).

    We agree that our previous analyses were suboptimal in several ways. We discuss the changes we have made to address this concern in response to points 2-4 below. We also now share unthresholded anticipation maps (Supplementary Figure 1), and show that an anticipation hierarchy is present even in those data. Thus, even if our approach failed to find statistically significant anticipation in some regions, the main claim of the paper holds when anticipation across the entire brain is considered.

    1. Across all searchlights, the number of estimated events was fixed to be the same as the number of annotated events. However, in previous work, Baldassano and colleagues (2017) showed that there are marked differences between regions in the timescales of event segmentation across the cortex. Therefore, it may be that in regions such as visual cortex, that tends to have very short events, the current approach identifies a mixture of neural activity patterns as one 'event'. This will add a lot of noise to the analysis and decrease the ability of the method to identify anticipatory event timings, particularly for regions lower in the cortical hierarchy that show many more events than tend to be observed in behavioural annotations.

    Thank you for raising this point. The reason we chose the same number of events for each region is to avoid confounding event numbers with anticipation amounts. Our concern was that, if we systematically vary the number of events used in the HMM along the anterior-posterior axis (based on the optimal event timescale), then any resulting differences in anticipation could potentially be driven by the fact that different HMM models were used in different regions. That is, one might see differences in ‘anticipation’ that are entirely driven by differences in the number of events used in the model. To avoid this confound, we fit identical models across the cortex (with a fixed number of events) during the anticipation analysis. We chose to keep this approach in our revision, but supplemented it with analyses that we hope address the relationships between optimal event numbers and amount of anticipation. We mentioned our new analysis in response to comments from Reviewer #1, but we include our response here as well for convenience.

    Our new analysis investigates anticipation amounts as a function of the optimal event timescale for a given brain region. We first found the optimal number of HMM events for a given brain region based on the first viewing of the movie clip. Regions with fewer events show slower timescales of processing than those with more events, and based on prior studies are known to integrate information over more of the past (Hasson et al., 2008; Hasson et al., 2015; Lerner et al., 2011). We then looked at anticipation within each timescale bin, while keeping the number of events fixed at seven. This analysis was compared to a null distribution where timescale values were correlated with permuted anticipation maps, for which repetition order was scrambled. We found that anticipation is further reaching for regions with longer timescales, as we hypothesized (Spearman rho = 0.319, p = 0.00031; Supplementary Figure 3). We believe this analysis nicely links our work to studies of hierarchical information integration.

    These new analyses have been incorporated into the Methods and Results as follows:

    “To relate the timescales of anticipation to the intrinsic timescales of brain regions during the first viewing, we fit the HMM on the first viewing alone, varying the number of events from 2 to 10. The HMM was trained on the average response from half of the participants (fitting the sequence of activity patterns for the events and the event variance) and the log-likelihood of the model was then measured on the average response in the other half of the participants. The training and testing sets were then swapped, and the log-likehoods from both directions were averaged together. Hyperalignment was not used during this fitting process, to ensure that the training and testing sets remained independent. The number of events that yielded the largest log-likelihood was identified as the optimal number of events for that searchlight. The optimal number of events was then compared to the anticipation timescale in that region (from the main analysis), using Spearman correlation” (p.14) “We compared how this hierarchy of anticipation timescales related to the intrinsic processing timescales in each region during the initial viewing of the movie clip. Identifying the optimal number of HMM events for each searchlight, we observed a timescale hierarchy similar to that described in previous work, with faster timescales in sensory regions and slower timescales in more anterior regions (Supplementary Figure 3a). Regions with longer intrinsic timescales also showed a greater degree of anticipation with repeated viewing (Supplementary Figure 3b).” (p.4)

    1. If I understand correctly, the HMM event segmentation model was applied to data from voxels within a searchlight that were averaged across participants. Regular normalization methods typically do not lead to good alignment at the level of single-voxels (Feilong et al., 2018, Neuroimage). Therefore, averaging the data without first hyperaligning them may lead to noise due to functional alignment issues within searchlights.

    Thank you for this suggestion! We re-ran all analyses after hyperalignment (using the Shared Response Model approach; Chen et al., 2015), and anticipatory signals are generally more robust and widespread when conducted in the hyperaligned space. We have therefore replaced all the main analyses in the paper with this new approach.

    1. In the analyses the five repeated viewings of the clips were averaged into a single dataset. However, it is likely that participants' ability to predict the upcoming information still increased after the first viewing. That is especially true for perceptual details that may not have been memorised after watching the clip once, but will be memorised after watching it five times. It is not clear why the authors choose to average viewings 2-6 rather than analyse only viewing 6, or perhaps even more interesting, look at how predictive signals varied with the number of viewings. I would expect that especially for early sensory regions, predictive signals increase with repeated viewing.

    Thank you for this suggestion, which we have implemented. Rather than averaging the timescourses from the repeated viewings, we fit the HMM jointly to data from all six viewings. This joint fit constrained the event patterns to be the same across viewings, but allowed the timing of these patterns to vary freely across viewings. We then averaged the anticipation results (from the time by events plots) across viewings 2-6, as suggested. The same pattern of results was observed, and this is now the main analysis in the paper (Figure 2). We also compared the first viewing to the last viewing, as suggested. As shown in Supplementary Figure 2, this analysis also showed a similar pattern of results.

    1. In the analyses of the alignment between the behavioural and neural event boundaries, the authors show the difference in correlation between the initial and repeated viewing without taking the estimated amount of anticipation into account. I wonder why the authors decided on this approach, rather than estimating the delay between the neural and behavioural event boundaries. The finding that is currently reported, i.e. a lower correlation between neural and annotated events in the repeated viewing condition, does not necessarily indicate anticipation. It could also suggest that with repeated viewing, participants' neural events are less reflective of the annotated events. Indeed the results in figure 5 suggest that the correlations are earlier but also lower for the repeated viewing condition.

    Thank you for raising this point; we agree that the previous analysis was suboptimal. We mentioned our new analysis in response to Essential Revision #3 and comments from Reviewer #1, but we include our response here as well for convenience. We agree that the most important test for this analysis is whether there is a systematic shift, across movie repetitions, in the timing of the peak cross-correlation between the brain’s event transitions and human-annotated event boundaries. To test this, we conducted a new analysis in which we measured the timing of the peak cross-correlation between HMM-derived event transitions in the brain and the human-annotated event boundaries, separately for each of the six movie viewings. In other words, we found the amount of shift in the brain’s event transitions that led to the maximum correlation with the timing of the human-annotated event boundaries. We then compared the timing of the correlation peak for the first movie viewing to the timing of the mean peak across viewings 2-6, and found regions of the brain where the peak shifted to be earlier with subsequent movie viewings. This was done as a whole-brain analysis with FDR correction. We include a figure (Figure 5) showing the data for the three searchlights that corresponded to clusters that met the q < .05 FDR criterion. The preceding analysis looked for regions for which the timing of the peak cross-correlation between the brain’s events and human-annotated events shifted earlier over movie repetitions, but did not test for the absolute location of that peak correlation (relative to zero lag between the HMM events and annotated events). Do the brain’s event transitions occur before annotated event transitions, after, or are they aligned? And how does this change over movie repetitions? We examined this question in the three clusters that emerged from the analysis in the preceding paragraph. We found that for the initial viewing, the brain’s event transitions lagged behind human-annotated event boundaries for two of the three clusters, whereas for the last cluster, the brain’s transitions and subjective event boundaries were aligned. For repeated viewings, the timing of the peak correlations shifted such that the brain’s representations of an event transition reliably preceded the occurrence of the human-annotated event boundary, for all three clusters (Figure 5).

    Together, these results confirm that, in some regions, the best alignment between the brain’s event transitions and human-annotated event boundaries shifts over movie repetitions such that the brain’s event transitions start to occur earlier over repetitions. In particular, the brain’s events shift to precede subjective event boundaries.

    1. To do the comparison between neural and annotated event boundaries, the authors refit the HMM model to clusters of significant voxels in the main analysis. I wonder why this was done rather than using the original searchlights. By grouping larger clusters of voxels, which cover many searchlights with potentially distinct boundary locations, the authors may be introducing noise into the analyses.

    Thank you for this suggestion. Our new analyses comparing neural and annotated event boundaries were conducted on the same searchlights used for the main results, i.e., we do not refit the HMM to significant clusters of voxels.

    Discussion:

    1. To motivate their use of the HMM model, the authors state that: "This model assumes that the neural response to a structured narrative stimulus consists of a sequence of distinct, stable activity patterns that correspond to event structure in the narrative." If neural events are indeed reflective of the narrative event structure, what does it mean if these neural events shift in time? How does this affect the interpretation the association between neural events and narrative events?

    Thank you for raising this issue, which we need to clarify. The HMM produces a probability distribution across states (events) at each timepoint. This probability distribution can reflect a combination of current and upcoming event representations. With more repetitions of the movie, these probability distributions start to shift, such that the expected value of the event assignments at any given time point moves toward upcoming events. Thus, it is not that the brain’s events no longer represent event structure in the narrative; they continue to represent current events while also starting to represent progressively more of upcoming events as well.

    To make that more concrete: during initial viewing, the HMM may be 100% confident that the brain’s representations reflect event #1 at the first time point in the movie. But during subsequent viewings, the same time point may be classified as 90% event #1 and 10% event #2. Thus, shifts in the expected value of the event indicate that the brain is anticipating components of upcoming events, even while continuing to represent the current event. The brain’s events are therefore still related to the narrative, but shift to incorporate upcoming events as well.

    We now clarify this as follows in the caption to Figure 1:

    “By fitting a Hidden Markov Model (HMM) jointly to all viewings, we can identify this shared sequence of event patterns, as well as a probabilistic estimate of event transitions. Regions with anticipatory representations are those in which event transitions occur earlier in time for repeated viewings of a stimulus compared to the initial viewing, indicated by an upward shift on the plot of the expected value of the event at each timepoint.” (p.3)

    And in the caption to Figure 2:

    “Because the HMM produces a probability distribution across states at each timepoint, which can reflect a combination of current and upcoming event representations, we plot the expected value of the event assignments at each timepoint.” (p.5)

    And in the Methods:

    “After fitting the HMM, we obtain an event by time-point matrix for each viewing, giving the probability that each timepoint belongs to each event. Note that because this assignment of timepoints to events is probabilistic, it is possible for the HMM to detect that the pattern of voxel activity at a timepoint reflects a mixture of multiple event patterns, allowing us to track subtle changes in the timecourse of how the brain is transitioning between events.” (p.13)

    Reviewer #3:

    Lee et al. report results from an fMRI experiment with repeated viewings of a single movie clip, finding that different brain regions come to anticipate events to different degrees. The findings are brief but a potentially very interesting contribution to the literature on prediction in the brain, as they use rich movie stimuli. This literature has been limited as it has typically focused on fixed short timescales of possible anticipation, with many repetitions of static visual stimuli, leading to only one possible time scale of anticipation. In contrast, the current video design allows the authors to look in theory for multiple timescales of anticipation spanning simple sensory prediction across seconds to complex social dynamics across tens of seconds.

    The authors applied a Hidden Markov Model to multivoxel fMRI data acquired across six viewings of a 90 second movie. They fit a small set of components with the goal of capturing the different sequentially-experienced events that make up the clip. The authors report clusters of regions across the brain that shift in their HMM-identified events from the first viewing of the movie through the (average of the) remaining 5 viewings. In particular, more posterior regions show a shift (or 'anticipation') on the order of a few seconds, while more anterior regions show a shift on the order of ~10 seconds. These identified regions are then investigated in a second way, to see how the HMM-identified events correspond to subjective event segmentation given by a separate set of human participants. These data are a re-analysis of previously published data, presenting a new set of results and highlighting how open sharing of imaging data can have great benefits. There are a few important statistical issues that the authors should address in a revision in order to fully support their arguments.

    1. The authors report different timescales of anticipation across what may be a hierarchy of brain regions. However, do these timescales change significantly across regions? The paper rests in part on these differences, but the analyses do not yet actually test for any change. For this, there are multiple methods the authors could employ, but it would be necessary to do more than fit a linear model to the already-reported list of (non-independently-sorted) regions.

    Thank you for this important suggestion. We tested whether there is a hierarchy in the posterior (less anticipation) to anterior (more anticipation) direction by calculating the Spearman correlation between the Y-coordinate of each significant voxel (indexing how posterior vs anterior that voxel is) and the amount of anticipation in that voxel. We compared this correlation value to correlations between the Y-coordinate and the amount of anticipation in null maps produced by randomly permuting the order of the viewings. We observed a Spearman rho of 0.58 for the anterior/posterior axis (p = 0.0030). This relationship persisted when the analysis was done on the unthresholded anticipation map (Spearman’s rho = 0.42, p = 0.0028; Supplementary Figure 1). Furthermore, there were no significant relationships between anticipation and the left-to-right (X) axis or the inferior-to-superior (Z) axis. We now describe this as follows in the paper:

    In Methods:

    “To determine if anticipation systematically varied across the cortex in the hypothesized posterior-to-anterior direction, we calculated the Spearman correlation between the Y-coordinate of each significant (q < 0.05) voxel (indexing the position of that voxel along the anterior/posterior axis) and the mean amount of anticipation in that voxel. To obtain a p-value, the observed correlation was compared to a null distribution in which the Spearman correlation was computed with the null anticipation values from the permutation analysis described above, in which the order of the viewings was randomly scrambled for each participant. For comparison, the correlation was also computed for the X (left-right) and Z (inferior-superior) axes. This analysis was repeated on unthresholded anticipation maps, to examine if this hierarchy remained even when including regions whose anticipation amounts did not reach statistical significance.” (p.13)

    In the Results:

    “The magnitude of this shift varied along a posterior to anterior temporal hierarchy (Spearman’s rho = 0.58, p = 0.0030), with the most anterior regions in the temporal pole and prefrontal cortex showing shifts of up to 15 seconds on subsequent viewings compared to the first viewing. This hierarchy persisted even when computed on the unthresholded anticipation map including voxels that did not meet the threshold for statistical significance (Spearman’s rho = 0.42, p = 0.0028; see Supplementary Figure 1). There were no significant correlations with the left-to-right axis (rho = 0.06, p = 0.41 for thresholded map; rho = 0.12, p = 0.29 for unthresholded map) or the inferior-to-superior axis (rho = 0.07, p = 0.28 for thresholded map; rho = -0.11, p = 0.73 for unthresholded map). We obtained a similar map when comparing the first viewing to just the sixth viewing alone (see Supplementary Figure 2).” (p.4)

    1. The description of the statistical methods is unclear at critical points, which leads to questions about the strength of the results. The authors applied the HMM to group-averaged fMRI data to find the neural events. Then they run statistical tests on the difference in the area-under-the-curve (AUC) results from first to other viewings. It seems like they employ bootstrap testing using the group data? Perhaps it got lost, but the methods described here about resampling participants do not seem to make sense if all participants contributed to the results. Following this, they note that they used a q < 0.05 threshold after applying FDR for the resulting searchlight clusters, but based on their initial statement about the AUC tests, this is actually one-tailed? Is the actual threshold for all these clusters q < 0.10? That would be quite a lenient threshold and it would be hard to support using it. The authors should clarify how these statistics are computed.

    We agree that we did not clearly describe the methods. In the previous draft, we used a standard bootstrapping approach in which the individuals contributing to a group analysis are sampled with replacement. Specifically, for each bootstrap iteration, we constructed a bootstrap group average timecourse by resampling participants with replacement, and then ran our full analysis pipeline on this group average. In response to reviewer suggestions to use alternative approaches to obtain a measure of false positive rates, this analysis is now no longer included in the manuscript.

    Instead, in the current draft, we have replaced this bootstrap approach with a permutation-based approach, in which (for each permutation iteration) we randomly shuffle each participant’s six responses to the six presentations of the clip, ensuring that there can be no consistent relationship between viewing order and anticipation. We ran our analysis pipeline on each of these permuted datasets, then fit a Normal null distribution to the resulting anticipation values obtained in each voxel. We obtained a one-tailed p value as the fraction of this distribution that exceeded the real result in this voxel. The p values across all voxels were then corrected for False Discovery Rate, and thresholded the resulting map at q < 0.05.

    This approach is described in the Methods:

    “To assess statistical significance, we utilized a permutation-based null hypothesis testing approach. We constructed null datasets by randomly shuffling each participant’s six responses to the six presentations of the movie clip. The full analysis pipeline (including hyperalignment) was run 100 times, once on the real (unpermuted) dataset and 99 times on null (permuted) datasets, with each analysis producing a map of anticipation across all voxels. A one-tailed p value was obtained in each voxel by fitting a Normal distribution to the null anticipation values, and then finding the fraction of this distribution that exceeded the real result in this voxel (i.e., showed more anticipation than in our unpermuted dataset). Voxels were determined significant (q < 0.05) after applying the Benjamini-Hochberg FDR (False Discovery Rate) correction, as implemented in AFNI (Cox, 1996).” (p.13)

    1. Regarding the relationship to annotated transitions, the reported difference in correlations at zero lag don't tell the story that the authors wish they tell, and as such it does not appear that they support the paper. While it is interesting to see that the correlation at zero lag in the initial viewing is often positive in the independently identified clusters, the fact that there is a drop in correlation on repeated viewings doesn't, in itself, mean that there has been a shift in the temporal relationship between the neural and annotated events. A drop in correlation could also occur if there was just no longer any correlation between the neural and annotated events at any lag due to noisy measurements, or even if, for example, the comparison wasn't to repeated viewings but to a totally different clip. The authors want to say something about the shift in in the waveform/peak, but they need to apply a different method to be able to make this argument.

    We addressed this issue above, under Essential Revision #3, but also include our response below for convenience.

    We conducted a new analysis in which we measured the timing of the peak cross-correlation between HMM-derived event transitions in the brain and the human-annotated event boundaries, separately for each of the six movie viewings. In other words, we found the amount of shift in the brain’s event transitions that led to the maximum correlation with the timing of the human-annotated event boundaries. We then compared the timing of the correlation peak for the first movie viewing to the timing of the mean peak across viewings 2-6, and found regions of the brain where the peak shifted to be earlier with subsequent movie viewings. This was done as a whole-brain analysis with FDR correction. We include a figure (Figure 5) showing the data for the three searchlights that corresponded to clusters that met the q < .05 FDR criterion.

    The preceding analysis looked for regions for which the timing of the peak cross-correlation between the brain’s events and human-annotated events shifted earlier over movie repetitions, but did not test for the absolute location of that peak correlation (relative to zero lag between the HMM events and annotated events). Do the brain’s event transitions occur before annotated event transitions, after, or are they aligned? And how does this change over movie repetitions? We examined this question in the three clusters that emerged from the analysis in the preceding paragraph. We found that for the initial viewing, the brain’s event transitions lagged behind human-annotated event boundaries for two of the three clusters, whereas for the last cluster, the brain’s transitions and subjective event boundaries were aligned. For repeated viewings, the timing of the peak correlations shifted such that the brain’s representations of an event transition reliably preceded the occurrence of the human-annotated event boundary, for all three clusters (Figure 5).

    1. Imaging methods with faster temporal resolution could reveal even earlier reactivation, or replay, of the movies, that would be relatively invisible with fMRI, and the authors do not discuss relevant recent work. E.g. Michelmann et al. 2019 (Nat Hum Beh) and Wimmer et al. 2020 (Nat Neuro) are quite relevant citations from MEG. Michelmann et al. utilize similar methods and results very similar to the current findings, while Wimmer et al. use a similar 'story' structure with only one viewing (followed by cued retrieval) and find a very high degree of temporal compression. The authors vaguely mention faster timescale methods in the discussion, but it would be important to discuss these existing results, and the relative benefits of these methods versus the benefits and limitations of fMRI. It would be interesting and puzzling if there were multiple neural timescales revealed by different imaging methods.

    Thank you for raising this point and those key studies. We have added a section to the Discussion to consider that research and its relation to the current work:

    “The current fMRI study is complementary to investigations of memory replay and anticipation that use MEG and intracranial EEG (iEEG). In an MEG study, Michelmann et al. (2019) found fast, compressed replay of encoded events during recall, with the speed of replay varying across the event. Furthermore, an iEEG investigation found anticipatory signals in auditory cortex when individuals listened to the same story twice (Michelmann et al., 2020). In another MEG study, Wimmer et al. (2020) found compressed replay of previously encoded information. Replay was forward when participants were remembering what came after an event, and backward when participants were remembering what came before an event. The forward replay observed in the Wimmer et al. study may be similar to the anticipatory signals observed in the current study, although there was no explicit demand on memory retrieval in our paradigm. Thus, one possibility is that the anticipatory signals observed in MEG or iEEG are the same as those we observe in fMRI, except that they are necessarily sluggish and smoothed in time when measured via a hemodynamic response. This possibility is supported by fMRI work showing evidence for compressed anticipatory signals, albeit at a slower timescale relative to MEG (Ekman, Kok, & de Lange, 2017).

    An alternative possibility is that the anticipatory signals measured in our study are fundamentally different from those captured via MEG or iEEG. That could explain why we failed to find widespread anticipatory signals in primary visual or primary auditory cortex: the anticipatory signals in those regions might have been too fast to be captured with fMRI, particularly when competing with incoming, dynamic perceptual input. Future studies that obtain fMRI and MEG or iEEG in participants watching the same movie would be informative in that regard. It is possible that fMRI may be particularly well-suited for capturing relatively slow anticipation of stable events, as opposed to faster anticipatory signals relating to fast sub-events. Nevertheless, advances in fMRI analyses may allow the detection of very fast replay or anticipation, closing the gap between these methods and allowing more direct comparisons (Wittkuhn & Schuck, 2021).” (p.10)

    1. The original fMRI experiment contained three conditions, while the current results only examine one of these conditions. Why weren't the results from the two scrambled clip conditions in the original experiment reported? Presumably there were no effects observed, but given that the original report focused on a change in response over time in a scrambled video where the scrambled order was preserved across repetitions, and the current report also focuses on changes across viewings, it would be important to describe reasons for not expecting similar results to these new ones in the scrambled condition.

    We agree it would be very interesting to systematically compare anticipation in those different conditions! Unfortunately, the Scrambled datasets are not well suited to answering this question for a couple of reasons. First, there is an issue of sample size. All 30 participants in Aly et al. (2018) viewed the same Intact movie clip. However, the Scrambled-Fixed and Scrambled-Random conditions had two clip-to-condition assignments, such that only 15 participants within each condition viewed the same clip. Thus, less data is available to look at anticipation within the Scrambled conditions. Another limitation is that our HMM analyses depend on group-averaged fMRI data; to the extent the different individuals show similar brain activity dynamics, the analysis will be more robust. While the Intact condition does have high inter-subject correlations in activity dynamics across many parts of the brain, the Scrambled conditions have much lower inter-subject correlations. We found that this makes hyperalignment (which, given reviewer recommendations, we now do prior to all analyses) work relatively poorly for the Scrambled-Fixed condition, and also makes data in that condition much more noisy than that in the Intact condition. Applying hyperalignment to the Intact and Scrambled-Fixed conditions simultaneously also produced poor fits. Thus, because of these limitations, we do not believe that our group-level approach in this study is appropriate for studying anticipation in the Scrambled-Fixed condition. All that said, this question is of key interest to us, and we are actively running studies to determine how anticipation varies as a function of the stimuli used.

  2. Reviewer #3 (Public Review):

    Lee et al. report results from an fMRI experiment with repeated viewings of a single movie clip, finding that different brain regions come to anticipate events to different degrees. The findings are brief but a potentially very interesting contribution to the literature on prediction in the brain, as they use rich movie stimuli. This literature has been limited as it has typically focused on fixed short timescales of possible anticipation, with many repetitions of static visual stimuli, leading to only one possible time scale of anticipation. In contrast, the current video design allows the authors to look in theory for multiple timescales of anticipation spanning simple sensory prediction across seconds to complex social dynamics across tens of seconds.

    The authors applied a Hidden Markov Model to multivoxel fMRI data acquired across six viewings of a 90 second movie. They fit a small set of components with the goal of capturing the different sequentially-experienced events that make up the clip. The authors report clusters of regions across the brain that shift in their HMM-identified events from the first viewing of the movie through the (average of the) remaining 5 viewings. In particular, more posterior regions show a shift (or 'anticipation') on the order of a few seconds, while more anterior regions show a shift on the order of ~10 seconds. These identified regions are then investigated in a second way, to see how the HMM-identified events correspond to subjective event segmentation given by a separate set of human participants. These data are a re-analysis of previously published data, presenting a new set of results and highlighting how open sharing of imaging data can have great benefits. There are a few important statistical issues that the authors should address in a revision in order to fully support their arguments.

    1. The authors report different timescales of anticipation across what may be a hierarchy of brain regions. However, do these timescales change significantly across regions? The paper rests in part on these differences, but the analyses do not yet actually test for any change. For this, there are multiple methods the authors could employ, but it would be necessary to do more than fit a linear model to the already-reported list of (non-independently-sorted) regions.

    2. The description of the statistical methods is unclear at critical points, which leads to questions about the strength of the results. The authors applied the HMM to group-averaged fMRI data to find the neural events. Then they run statistical tests on the difference in the area-under-the-curve (AUC) results from first to other viewings. It seems like they employ bootstrap testing using the group data? Perhaps it got lost, but the methods described here about resampling participants do not seem to make sense if all participants contributed to the results. Following this, they note that they used a q < 0.05 threshold after applying FDR for the resulting searchlight clusters, but based on their initial statement about the AUC tests, this is actually one-tailed? Is the actual threshold for all these clusters q < 0.10? That would be quite a lenient threshold and it would be hard to support using it. The authors should clarify how these statistics are computed.

    3. Regarding the relationship to annotated transitions, the reported difference in correlations at zero lag don't tell the story that the authors wish they tell, and as such it does not appear that they support the paper. While it is interesting to see that the correlation at zero lag in the initial viewing is often positive in the independently identified clusters, the fact that there is a drop in correlation on repeated viewings doesn't, in itself, mean that there has been a shift in the temporal relationship between the neural and annotated events. A drop in correlation could also occur if there was just no longer any correlation between the neural and annotated events at any lag due to noisy measurements, or even if, for example, the comparison wasn't to repeated viewings but to a totally different clip. The authors want to say something about the shift in in the waveform/peak, but they need to apply a different method to be able to make this argument.

    4. Imaging methods with faster temporal resolution could reveal even earlier reactivation, or replay, of the movies, that would be relatively invisible with fMRI, and the authors do not discuss relevant recent work. E.g. Michelmann et al. 2019 (Nat Hum Beh) and Wimmer et al. 2020 (Nat Neuro) are quite relevant citations from MEG. Michelmann et al. utilize similar methods and results very similar to the current findings, while Wimmer et al. use a similar 'story' structure with only one viewing (followed by cued retrieval) and find a very high degree of temporal compression. The authors vaguely mention faster timescale methods in the discussion, but it would be important to discuss these existing results, and the relative benefits of these methods versus the benefits and limitations of fMRI. It would be interesting and puzzling if there were multiple neural timescales revealed by different imaging methods.

    5. The original fMRI experiment contained three conditions, while the current results only examine one of these conditions. Why weren't the results from the two scrambled clip conditions in the original experiment reported? Presumably there were no effects observed, but given that the original report focused on a change in response over time in a scrambled video where the scrambled order was preserved across repetitions, and the current report also focuses on changes across viewings, it would be important to describe reasons for not expecting similar results to these new ones in the scrambled condition.

  3. Reviewer #2 (Public Review):

    Aly et al. investigated anticipatory signals in the cortex by analysing data in which participants repeatedly watched the same movie clip. The authors identified events using an HMM-based data-driven event segmentation method and examined how the timing of events shifted between the initial and repeated presentation of the same video clip. A number of brain regions were identified in which event timings were shifter earlier in time due to repeated viewing. The main findings is that more anterior brain regions showed more anticipation than posterior brain regions. The reported findings are very interesting, the approach the authors used is innovative and the main conclusions are supported by the results and analyses. However, many cortical regions did not show any anticipatory effects and it is not clear why that is. In part, this may be due to a number of suboptimal aspects in the analysis approach. In addition, the analyses of behavioural annotations are open to multiple interpretations.

    Methods and Results:

    1. The paper shows that across multiple regions in the cortex, there is significant evidence for anticipation of events with repeated viewing. However, there are also many areas that do not show evidence for anticipation. It is not clear whether this is due to a lack of anticipation in those areas, or due to noise in the data or low power in the analyses. There are two factors that may be causing this issue. First, the data that were used are not optimal, given the short movie clip and relatively low number of participants. Second, there are a number of important issues with the analyses that may have introduced noise in the observed neural event boundaries (see points 2-4 below).

    2. Across all searchlights, the number of estimated events was fixed to be the same as the number of annotated events. However, in previous work, Baldassano and colleagues (2017) showed that there are marked differences between regions in the timescales of event segmentation across the cortex. Therefore, it may be that in regions such as visual cortex, that tends to have very short events, the current approach identifies a mixture of neural activity patterns as one 'event'. This will add a lot of noise to the analysis and decrease the ability of the method to identify anticipatory event timings, particularly for regions lower in the cortical hierarchy that show many more events than tend to be observed in behavioural annotations.

    3. If I understand correctly, the HMM event segmentation model was applied to data from voxels within a searchlight that were averaged across participants. Regular normalization methods typically do not lead to good alignment at the level of single-voxels (Feilong et al., 2018, Neuroimage). Therefore, averaging the data without first hyperaligning them may lead to noise due to functional alignment issues within searchlights.

    4. In the analyses the five repeated viewings of the clips were averaged into a single dataset. However, it is likely that participants' ability to predict the upcoming information still increased after the first viewing. That is especially true for perceptual details that may not have been memorised after watching the clip once, but will be memorised after watching it five times. It is not clear why the authors choose to average viewings 2-6 rather than analyse only viewing 6, or perhaps even more interesting, look at how predictive signals varied with the number of viewings. I would expect that especially for early sensory regions, predictive signals increase with repeated viewing.

    5. In the analyses of the alignment between the behavioural and neural event boundaries, the authors show the difference in correlation between the initial and repeated viewing without taking the estimated amount of anticipation into account. I wonder why the authors decided on this approach, rather than estimating the delay between the neural and behavioural event boundaries. The finding that is currently reported, i.e. a lower correlation between neural and annotated events in the repeated viewing condition, does not necessarily indicate anticipation. It could also suggest that with repeated viewing, participants' neural events are less reflective of the annotated events. Indeed the results in figure 5 suggest that the correlations are earlier but also lower for the repeated viewing condition.

    6. To do the comparison between neural and annotated event boundaries, the authors refit the HMM model to clusters of significant voxels in the main analysis. I wonder why this was done rather than using the original searchlights. By grouping larger clusters of voxels, which cover many searchlights with potentially distinct boundary locations, the authors may be introducing noise into the analyses.

    Discussion:

    1. To motivate their use of the HMM model, the authors state that: "This model assumes that the neural response to a structured narrative stimulus consists of a sequence of distinct, stable activity patterns that correspond to event structure in the narrative." If neural events are indeed reflective of the narrative event structure, what does it mean if these neural events shift in time? How does this affect the interpretation the association between neural events and narrative events?
  4. Reviewer #1 (Public Review):

    In this study, Lee et al. reanalyzed a previous fMRI dataset (Aly et al., 2018) in which participants watched the same 90s movie segment six times. Using event-segmentation methods similar to Baldassano et al. (2017), they show that event boundaries shifted for the average of the last 5 viewings as compared to the first viewing, in some regions by as much as 12 seconds. Results provide evidence for anticipatory neural activity, with apparent differences across brain regions in the timescale of this anticipation, in line with previous reports of a hierarchy of temporal integration windows.

    – One of the key findings of the paper – long-timescale anticipatory event reinstatement – overlaps with the findings of Baldassano et al., 2017. However, the previous study could not address the multiple time scales/hierarchy of predictions. Considering that this is the novel contribution of the current study, more statistical evidence for this hierarchy should be provided.

    – The current hierarchy of anticipation is closely linked to (and motivated by) previous studies showing evidence of a hierarchy of temporal integration windows. Indeed, the question of the study was "whether this hierarchy also exists in a prospective direction". This question is currently addressed somewhat indirectly, by displaying above-threshold brain regions, but without directly relating this hierarchy to previous findings of temporal integration windows, and without directly testing the claimed "posterior (less anticipation) to anterior (more anticipation) fashion" (from abstract).

    – The analysis is based on averaging the data of the 5 repeated viewings and comparing this average with the data of the first viewing. This means that the repeated viewing condition had much more reliable data than the initial viewing condition. This could potentially affect the results (e.g. better fit to HMM). To avoid this bias, the 5 repeated viewings could be entered separately into the analysis (e.g., each separately compared to the first viewing) and results averaged at the end. Alternatively, only the 6th viewing could be compared to the first viewing (as in Aly et al., 2018).

    – Correlation analysis (Fig 6). "we tested whether these correlations were significantly positive for initial viewing and/or repeated viewing, and whether there was a significant shift in correlation between these conditions". It was not clear to me how we should interpret the correlation results in Figure 6. Might a lower correlation for repeated viewing not also reflect general suppression (e.g. participants no longer paying attention to the movie)? Perhaps comparing the correlations at the optimal lag (for each cluster) might help to reduce this concern; that is, the correlation difference would only exist at lag-0.

    – Correlation analysis (Figure 6). "For both of these regions the initial viewing data exhibits transitions near the annotated boundaries, while transitions in repeated viewing data occur earlier than the annotated transitions" How was this temporal shift statistically assessed?

    – Not all clusters in Figure 2/6 look like contiguous and meaningful clusters. For example, cluster 9 appears to include insula as well as (primary?) sensorimotor cortex, and cluster 4 includes both ventral temporal cortex and inferior parietal cortex/TPJ. It is thus not clear what we can conclude from this analysis about specific brain regions. For example, the strongest r-diff is in cluster 4, but this cluster includes a very diverse set of regions.

    – In previous related work, the authors correlated time courses within and across participants, providing evidence for temporal integration windows. For example, in Aly et al., 2018 (same dataset), the authors correlated time courses across repeated viewings of the movie. Here, one could similarly correlate time courses across repeated viewings, shifting this time course in multiple steps and testing for the optimal lag. This would seem a more direct (and possibly more powerful) test of anticipation and would link the results more closely to the results of the previous study. If this analysis is not possible to reveal the anticipation revealed here, please motivate why the event segmentation is crucial for revealing the current findings.

  5. Evaluation Summary:

    The study addresses a topic that is timely and of general interest. The findings represent a potentially very interesting contribution to the important question of how the brain comes to predict the future, in particular lifelike sequences of events. However, some of the main conclusions would require further statistical support.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)