Synaptic Theory of Chunking in Working Memory

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    This valuable study links psychological theories of chunking with a physiological implementation based on short-term synaptic plasticity and synaptic augmentation. The theoretical derivation for increased memory capacity via hierarchical chunking is solid. However, the model robustness and biological grounding of the mechanism - including many aspects that were hard-wired, chunking cues, and parameter ranges - as well as its evaluation in the task settings that motivated the study, are incomplete. Additional simulations to test robustness in more cognitively and biologically realistic settings, a systematic parameter analysis, and stronger links to prior work would substantially strengthen the manuscript and increase its impact across disciplines.

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Working memory often appears to exceed its basic span by organizing items into compact representations called chunks. Chunking can be learned over time for familiar inputs; however, it can also arise spontaneously for novel stimuli. Such on-the-fly structuring is crucial for cognition, yet the underlying neural mechanism remains unclear. Here we introduce a synaptic theory of chunking, in which short-term synaptic plasticity enables the formation of chunk representations in working memory. We show that a specialized population of “chunking neurons” selectively controls groups of stimulus-responsive neurons, akin to gating. As a result, the network maintains and retrieves the stimuli in chunks, thereby exceeding the basic capacity. Moreover, we show that our model can dynamically construct hierarchical representations within working memory through hierarchical chunking. A consequence of this proposed mechanism is a new limit on the number of items that can be stored and subsequently retrieved from working memory, depending only on the basic working memory capacity when chunking is not invoked. Predictions from our model were confirmed by analyzing single-unit responses in epileptic patients and memory experiments with verbal material. Our work provides a novel conceptual and analytical framework for understanding how the brain organizes information in real time.

Article activity feed

  1. eLife Assessment

    This valuable study links psychological theories of chunking with a physiological implementation based on short-term synaptic plasticity and synaptic augmentation. The theoretical derivation for increased memory capacity via hierarchical chunking is solid. However, the model robustness and biological grounding of the mechanism - including many aspects that were hard-wired, chunking cues, and parameter ranges - as well as its evaluation in the task settings that motivated the study, are incomplete. Additional simulations to test robustness in more cognitively and biologically realistic settings, a systematic parameter analysis, and stronger links to prior work would substantially strengthen the manuscript and increase its impact across disciplines.

  2. Reviewer #1 (Public review):

    Summary:

    This study extends the short-term synaptic plasticity (STP)-based theory of activity-silent working memory (WM) by introducing a physiological mechanism for chunking that relies on synaptic augmentation (SA) and specialized chunking clusters. The model consists of a recurrent neural network comprising excitatory clusters representing individual items and a global inhibitory pool. The self-connections within each cluster dynamically evolve through the combined effects of STP and SA. When a chunking cue, such as a brief pause in a stimulus sequence, is presented, the chunking cluster transiently suppresses the activity of the item clusters, enabling the grouped items to be maintained as a coherent unit and subsequently reactivated in sequence. This mechanism allows the network to enhance its effective memory capacity without exceeding the number of simultaneously active clusters, which defines the basic capacity. They further derive a new upper limit of WM capacity, the new magic number. When the basic capacity is four, the upper bound for complete recall becomes eight, and the optimal hierarchical structure corresponds to a binary tree of two-item pairs forming four chunks that combine into two meta-chunks. Reanalysis of linguistic data and single-neuron recordings from human epilepsy patients (identifying boundary neurons) provides qualitative support for the model's predictions.

    Strengths:

    This study makes an important contribution to theoretical and computational neuroscience by proposing a physiologically grounded mechanism for chunking based on STP and SA. By embedding these processes in a recurrent neural network, the authors provide a unified account of how chunks can be formed, maintained, and sequentially retrieved through local circuit dynamics, rather than through top-down cognitive strategies. The work is conceptually original, analytically rigorous, and clearly presented, deriving a simple yet powerful capacity law that extends the classical magic number framework from four to eight items under hierarchical chunking. The modeling results are further supported by preliminary empirical evidence from linguistic data and single-neuron recordings in the human medial temporal lobe, lending credibility to the proposed mechanism. Overall, this is a well-designed and well-written study that offers novel insights into the neural basis of working-memory capacity and establishes a solid bridge between theoretical modeling and experimental findings.

    Weaknesses:

    This study is conceptually strong and provides an elegant theoretical framework, but several aspects limit its biological and empirical grounding.

    First, the control mechanism that triggers and suppresses chunking clusters remains only schematically defined. The model assumes that chunking events are initiated by pauses, prosodic cues, or internal control signals, but does not specify the underlying neural circuits (e.g., prefrontal-basal ganglia loops) that could mediate this gating in the brain. Clarifying where, when, and how the chunking clusters are turned on and off will be critical for establishing biological plausibility.

    Second, the network representation is simplified: item clusters are treated as non-overlapping and homogeneous, whereas real cortical circuits exhibit overlapping representations, distinct excitatory/inhibitory populations, and multiscale local and long-range connectivity. It remains unclear how robust the proposed dynamics and derived capacity limit would be under such biologically realistic conditions.

    Third, the model heavily relies on SA operating over a timescale of several seconds, yet in vivo, the time constants and prevalence of SA can vary widely across cortical regions and neuromodulatory states. The stability of the predicted "new magic number" under realistic noise levels and modulatory influences, therefore, needs to be systematically evaluated.

  3. Reviewer #2 (Public review):

    Summary:

    This work extends a previous recurrent neural network model of activity-silent working memory to account for well-established findings from psychology and neuroscience suggesting that working memory capacity constraints can be partially overcome when stimuli can be organized into chunks. This is accomplished via the introduction of specialized chunking clusters of neurons to the original model. When these chunking clusters are activated by a cue (such as a longer delay between stimuli), they rapidly suppress recently active stimulus clusters. This makes these stimulus clusters available for later retrieval via a synaptic augmentation mechanism, thereby expanding the network's overall effective capacity. Furthermore, these chunking clusters can be arranged in a hierarchical fashion, where chunking clusters are themselves chunked by higher-level chunking clusters, further expanding the network's overall effective capacity to a new "magic number", 2^{C-1} (where C is the basic capacity without chunking). In addition to illustrating the basic dynamics of the model with detailed simulations (Figures 1 and 2), the paper also utilizes qualitative predictions from the model to (re-)analyze data collected in previous experiments, including single-unit recordings from human medial temporal lobe as well as behavioral findings from a classic study of human memory.

    Strengths:

    The writing and figures are very clear, and the general topic is relevant to a broad interdisciplinary audience. The work is strongly theory-driven, but also makes some effort to engage with existing data from two empirical studies. The basic results showcasing how chunking can be achieved in an activity-silent working memory model via suppression and synaptic augmentation dynamics are interesting. Furthermore, we agree with the authors that the derivation of their new "magic number" is relatively general and could apply to other models, so those findings in particular may be of interest even to researchers using different modeling frameworks.

    Weaknesses:

    (1) Very important aspects of the model are assumed / hard-coded, raising the concern that it relies too much on an external controller, and that it would therefore be difficult to implement the same principles in a fully behaving model responsible for producing its own outputs from a sequence of stimuli (i.e., without a priori knowledge of the structure of incoming sequences).

    (i) One such aspect is the use of external chunking cues provided to the model at critical times to activate the chunking clusters. The simulations reported in the paper were conducted in a setting where signals to chunk are conveniently indicated by longer delays between stimuli. In this case, it is not difficult to imagine how an external component could detect the presence of such a delay and activate a chunking cluster in response. However, in order for the model to be more broadly applicable to different memory tasks that elicit chunking-related phenomena, a more general-purpose detector would be required (see further comments below and alternative models).

    (ii) Relatedly, and as the authors acknowledge in the discussion, the network relies on a pretty sophisticated external controller that decides when the individual chunking clusters are activated or deactivated during readout/retrieval. This seems especially complex in the hierarchical case. How might a network decide which chunking/meta-chunking clusters are activated/deactivated in which order? This was hard-coded in their simulations, but we imagine that it would be difficult to implement a general solution to this problem, especially in cases where there is ambiguity about which stimuli should be chunked, or where the structure of the incoming sequence is not known in advance.

    (iii) One of the central mechanisms of the model is the rapid synaptic plasticity in the inhibitory connections responsible for binding chunking clusters to their corresponding stimulus clusters. This mechanism again appears to have been hard-coded in the main simulations. Although we appreciate that the authors worked on one possible way that this could be implemented (Methods section D, Supplementary Figure S2), in the end, their solution seems to rely on precisely fine-tuning the timing with which stimuli are presented - a factor that seems unlikely to matter very much in humans/animals. This stands in contrast with models of working memory that rely on persistent activity, which are more robust to changes in timing. Note that we do not discount the possibility of activity-silent WM, and indeed it should be studied in its own right, but it is then even more important to highlight which of its features are dependent on the time constants, etc.

    (2) Another key shortcoming of this work is its limited direct engagement with empirical evidence and alternative computational accounts of chunking in WM. Although the efforts to re-analyze existing empirical results in light of the new predictions made by the model are commendable, in the end, we think they fall short of being convincing. As noted above, the model doesn't actually perform the same two tasks used in the human experiments, so direct quantitative comparisons between the model and human behavior or neural data are not possible. Instead, the authors rely on isolating two qualitative predictions of the model - the "dip" and "ramp" phenomena observed after a chunking cluster is activated (Figure 3), and the new magic number for effective capacity derived from the model in the case where stimuli are chunkable, which approximately converges with human recall performance in a memory study (Figure 4). Below, we highlight some specific issues related to these two sets of analyses, but the larger point is that if the model is making a commitment about how these neural mechanisms relate to behavioral phenomena, it would be important to test if the model can produce the behavioral patterns of data in experimental paradigms that have been extensively used to characterize those phenomena. For example, modern paradigms characterizing capacity limits have been more careful to isolate the contributions of WM per se (whereas the original magic number 7 is now thought to reflect a combination of episodic and working memory; see Cowan 2010). There are several existing models that more directly engage with this literature (e.g., Edin et al., 2009; Matthey et al., 2015; Nassar et al., 2018; Soni & Frank, 2025; Swan & Wyble, 2014; van den Berg et al., 2014; Wei et al., 2012), some of which also account for chunking-related phenomena (e.g., Wei et al, 2012; Nassar et al., 2018; Panichello et al., 2019; Soni & Frank, 2025). A number of related proposals suggest that WM capacity limits emerge from fundamentally different mechanisms than the one considered here - for example, content-related interference (Bays, 2014; Ma et al., 2014; Schurgin et al., 2020), or limitations in the number of content-independent pointers that can be deployed at a given time (Awh & Vogel, 2025), and/or the inherent difficulty of learning this binding problem (Soni & Frank, 2025). We think it would be worth discussing how these ideas could be considered complementary or alternatives to the ones presented here.

    (i) Single unit recordings. We found it odd that the authors chose to focus on evidence from single-unit recordings in the medial temporal lobe from a study focused on episodic memory. It was unclear how exactly these data are supposed to relate to their proposal. Is the suggestion that a mechanism similar to the boundary neurons might be operative in the case of working memory over shorter timescales in WM-related areas such as the prefrontal cortex, or that their chunking mechanism may relate not only to working memory but also to episodic memory in the medial temporal lobe?

    (ii) N-gram memory experiment. Our main complaint about the analysis of the behavioral data from the human memory study (Figure 4) is that the model clearly does not account for the main effect observed in that study - namely, the better recall observed for higher-order n-gram approximations to English. We acknowledge that this was perhaps not the main point of the analysis (which related more to the prediction about the absolute capacity limit M*), but it relates to a more general criticism that the model cannot account for chunking behavior associated with statistical learning or semantic similarity. Most of the examples used in the introduction and discussion are of this kind (e.g., expressions such as "Oh my God" or "Easier said than done", etc.). However, the chunking mechanism of the model should not have any preference for segmenting based on statistical regularities or semantic similarity - it should work just as well if statistical anomalies or semantic dissimilarity were used as external chunking cues. In our view, these kinds of effects are likely to relate to the brain's use of distributed representations that can capture semantic similarity and learn statistical regularities in the environment. Although these kinds of effects may be beyond the scope of this model, some effort could be made to highlight this in the discussion. But again, more generally, the paper would be more compelling if the model were challenged to simulate more modern experimental paradigms aimed at testing the nature of capacity limits in WM, or chunking, etc.

    (iii) There are a number of other empirical phenomena that we're not sure the model can explain. In particular, one of the hallmarks of WM capacity limits is that it suffers from a recency bias, where people are more likely to remember the most recent items at the expense of items presented prior to that (Oberauer et al 2012). [There are also studies showing primacy effects in addition to recency effects, but the primacy effects are generally attributed to episodic rather than working memory - for example, introducing a distractor task abolishes the recency but not primacy effect]. But the current model seems to make the opposite prediction: when the stimuli exceed its base capacity, it appears to forget the most recent stimuli rather than the earliest ones (Figure 1d). This seems to result from the number of representations that can be reactivated within a cycle and thus seems inherent to the dynamics of the model, but the authors can clarify if, instead, it depends on the particular values of certain parameters. (In contrast, this recency effect is captured in other models with chunking capabilities based on attractive dynamics and/or gating mechanisms - eg Boboeva et al 2023; Soni & Frank (2025)). Relatedly, we're not sure if the model could account for the more recent finding that recall is specifically enhanced when chunks occur in early serial positions compared to later ones (Thalmann, Souza, Oberauer, 2019).

  4. Reviewer #3 (Public review):

    The paper presents a synaptic mechanism for chunking in working memory, extending previous work of the last author by introducing specialized "chunking clusters", neural populations that can dynamically segment incoming items into chunks. The idea is that this enables hierarchical representations that increase the effective capacity of working memory. They also derive a theoretical bound for working memory capacity based on this idea, suggesting that hierarchical chunking expands the number of retrievable items beyond the basic WM capacity. Finally, they present neural and behavioral data related to their hypothesis.

    Strengths

    A major strength of the paper is its clear theoretical ambition of developing a mechanistic model of working memory chunking.

    Weaknesses

    Despite the inspiration in biophysical mechanisms (short-term synaptic plasticity with different time constants), the model is "cartoonish". It is unclear whether the proposed mechanism would work reliably in the presence of noise and non-zero background activity or in a more realistic implementation (e.g., a spiking network).

    As far as I know, there is no evidence for cyclic neural activation patterns, which are supposed to limit WM capacity (such as in Figure 1d). In fact, I believe there is no evidence for population bursts in WM, which are a crucial ingredient of the model. For example, Panicello et al. 2024 have found evidence for periods during which working memory decoding accuracy decreases, but no population bursts were observed in their data. In brief, my critique is that including some biophysical mechanism in an abstract model does not make the model plausible per se.

    It is claimed that "our proposed chunking mechanism applies to both the persistent-activity and periodic-activity regimes, with chunking clusters serving the same function in each", but this is not shown. If the results and model predictions are the same, irrespective of whether WM is activity-silent or persistent, I suggest highlighting this more and including the corresponding simulations.

    The empirical validations of the model are weak. The single-unit analysis is purely descriptive, without any statistical quantification of the apparent dip-ramp pattern. I agree that the dip-ramp pattern may be consistent with the proposed model, but I don't believe that this pattern is a specific prediction of the proposed model. It seems just to be an interesting observation that may be compatible with several network mechanisms involving some inhibition and a rebound.

    Moreover, the reanalyses of n-gram behavioral data do not constitute a mechanistic test of the model. The "new magic number" depends strongly on structural assumptions about how chunking operates, and it is unclear whether human working memory uses the specific hierarchical scheme required to achieve the predicted limit.

    The presentation of the modeling results is highly compressed in two figures and is rather hard to follow. Plotting the activity of different neural clusters in separate subplots or as heatmaps (x-axis time, y-axis neural population, color = firing rate) would help to clarify (Figure 1d). Also, control signals that activate the chunking clusters should be shown.

    Overall, the theoretical proposal is interesting, but its empirical grounding and biological plausibility need to be substantially reinforced.