Interplay between external inputs and recurrent dynamics during movement preparation and execution in a network model of motor cortex

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    The study provides a recurrent network model of M1 for center-out reaches. The "neurons" in the model show uncorrelated tuning for movement direction during preparation and execution with dynamic transition between the two states. The continuous attractor model provides an interesting example of flexible switching between neural representations.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The primary motor cortex has been shown to coordinate movement preparation and execution through computations in approximately orthogonal subspaces. The underlying network mechanisms, and the roles played by external and recurrent connectivity, are central open questions that need to be answered to understand the neural substrates of motor control. We develop a recurrent neural network model that recapitulates the temporal evolution of neuronal activity recorded from the primary motor cortex of a macaque monkey during an instructed delayed-reach task. In particular, it reproduces the observed dynamic patterns of covariation between neural activity and the direction of motion. We explore the hypothesis that the observed dynamics emerges from a synaptic connectivity structure that depends on the preferred directions of neurons in both preparatory and movement-related epochs, and we constrain the strength of both synaptic connectivity and external input parameters from data. While the model can reproduce neural activity for multiple combinations of the feedforward and recurrent connections, the solution that requires minimum external inputs is one where the observed patterns of covariance are shaped by external inputs during movement preparation, while they are dominated by strong direction-specific recurrent connectivity during movement execution. Our model also demonstrates that the way in which single-neuron tuning properties change over time can explain the level of orthogonality of preparatory and movement-related subspaces.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    This article is aimed at constructing a recurrent network model of the population dynamics observed in the monkey primary motor cortex before and during reaching. The authors approach the problem from a representational viewpoint, by (i) focusing on a simple center-out reaching task where each reach is predominantly characterised by its direction, and (ii) using the machinery of continuous attractor models to construct network dynamics capable of holding stable representations of that angle. Importantly, M1 activity in this task exhibits a number of peculiarities that have pushed the authors to develop important methodological innovations which, to me, give the paper most of its appeal. In particular, M1 neurons have dramatically different tuning to reach direction in the movement preparation and execution epochs, and that fact motivated the introduction of a continuous attractor model incorporating (i) two distinct maps of direction selectivity and (ii) distinct degrees of participation of each neuron in each map. I anticipate that such models will become highly relevant as neuroscientists increasingly appreciate the highly heterogeneous, and stable-yet-non-stationary nature of neural representations in the sensory and cognitive domains.

    As far as modelling M1 is concerned, however, the paper could be considerably strengthened by a more thorough comparison between the proposed attractor model and the (few) other existing models of M1 (even if these comparisons are not favourable they will be informative nonetheless). For example, the model of Kao et al (2021) seems to capture all that the present model captures (orthogonality between preparatory and movement-related subspaces, rotational dynamics, tuned thalamic inputs mostly during preparation) but also does well at matching the temporal structure of single-neuron and population responses (shown e.g. through canonical correlation analysis). In particular, it is not clear to me how the symmetric structure of connectivity within each map would enable the production of temporally rich responses as observed in M1. If it doesn't, the model remains interesting, as feedforward connectivity between more than two maps (reflecting the encoding of many more kinematic variables) or other mechanisms (such as proprioceptive feedback) could well explain away the observed temporal complexity of neural responses. Investigating such alternative explanations would of course be beyond the scope of this paper, but it is arguably important for the readers to know where the model stands in the current literature.

    Below is a summary of my view on the main strengths and weaknesses of the paper:

    1. From a theoretical perspective, this is a great paper that makes an interesting use of the multi-map attractor model of Romani & Tsodyks (2010), motivated by the change in angular tuning configuration from the preparatory epoch to the movement execution epoch. Continuous attractor models of angular tuning are often criticised for being implausibly homogeneous/symmetrical; here, the authors address this limitation by incorporating an extra dimension to each map, namely the degree of participation of each neuron (the distribution of which is directly extracted from data). This extension of the classical ring model seems long overdue! Another nice thing is the direct use of data for constraining the model's coupling parameters; specifically, the authors adjust the model's parameters in such a way as to match the temporal evolution of a number of "order parameters" that are explicitly manifested (i.e. observable) in the population recordings.

    I believe the main weakness of this continuous attractor approach is that it - perhaps unduly binarises the configuration of angular tuning. Specifically, it assumes that while angular tuning switches at movement onset, it is otherwise constant within each epoch (preparation and execution). I commend the authors for carefully motivating this in Figure 2 (2e in particular), by showing that the circular variance of the distribution of preferred directions is higher across prep & move than within either prep or move. While this justifies a binary "two-map model" to first order, the analysis nevertheless shows that preferred directions do change, especially within the preparatory epoch. Perhaps the authors could do some bootstrapping to assess whether the observed dispersion of PDs within sub-periods of the delay epoch is within the noise floor imposed by the finite number of trials used to estimate tuning curves. If it is, then this considerably strengthens the model; otherwise, the authors should say that the binarisation reflects an approximation made for analytical tractability, and discuss any important implications.

    We thank the reviewer for the suggested analysis. We have included this new analysis in Fig. S1.

    First of all, in Fig 2e of the previous version of the manuscript, we were considering three time windows during preparation and two time windows during movement execution. We are now using a shorter time window of 160ms, so that we can fit three time windows within either epoch. The results do not change qualitatively, and the results of the bootstrap analysis below do not change based on the definition of this time window.

    The bootstrap analysis is described in detail in the second paragraph of the Methods sections (“Preparatory and movement-related epochs of motion”). The bootstrap distribution is generated by resampling trials with repetitions (and keeping the number of trials per condition the same as in the data), while shuffling the temporal windows in time, within epochs. For example: for condition 1, we have 43 trials in the data. In one trial of the bootstrap distribution for condition 1, each one of the 3 time windows of the delay period is chosen at random (with repetitions) between the possible 43*3 windows from the data. The analysis shows that the median variance of preferred directions from the data is significantly larger than the one from the bootstrap samples.

    This suggests that neurons do change their preferred direction within epochs, but these changes are smaller in magnitude than changes that occur between the epochs. We explicitly comment on this in the methods, and in the main text we point out that considering only two epochs is a simplifying assumption, and as such it can be thought as a first step towards building a more complete model that shows dynamics of tuning within both preparatory and execution epochs. Note, however, that this simple framework is enough for the model to recapitulate to a large extent neuronal activity, both at the level of single-units and at the population level.

    1. While it is great to constrain the model parameters using the data, there is a glaring "issue" here which I believe is both a weakness and a strength of the approach. The model has a lot of freedom in the external inputs, which leads to relatively severe parameter degeneracies. The authors are entirely forthright about this: they even dedicate a whole section to explaining that depending on the way the cost function is set up, the fit can land the model in very different regimes, yielding very different conclusions. The problem is that I eventually could not decide what to make of the paper's main results about the inferred external inputs, and indeed what to make of the main claim of the abstract. It would be great if the authors could discuss these issues more thoroughly than they currently do, and in particular, argue more strongly about the reasons that might lead one to favour the solutions of Fig 6d/g over that of Fig 6a. On the other hand, I see the proposed model as an interesting playground that will probably enable a more thorough investigation of input degeneracies in RNN models. Several research groups are currently grappling with this; in particular, the authors of LFADS (Pandarinath et al, 2018) and other follow-up approaches (e.g. Schimel et al, 2022) make a big deal of being able to use data to simultaneously learn the dynamics of a neural circuit and infer any external inputs that drive those dynamics, but everyone knows that this is a generally ill-posed problem (see also discussion in Malonis et al 2021, which the authors cite). As far as I know, it is not yet clear what form of regularisation/prior might best improve identifiability. While Bachschmid-Romano et al. do not go very far in dissecting this problem, the model they propose is low-dimensional and more amenable to analytical calculations, such that it provided a valuable playground for future work on this topic.

    We agree with the reviewer that the problem of disambiguating between feedforward and recurrent connections from observation of the state of the recurrent units alone is a degenerate problem in general.

    By explicitly looking for solutions that minimize the role of external inputs in driving the dynamics, we argued that the solutions of Fig 4d/g are favorable over the one of Fig 4a because they are based on local computations implemented through shorter range connections compared to incoming connections from upstream areas; as such, they likely require less metabolic energy.

    In the new version of the paper, we discuss this issue more explicitly:

    Degeneracy of solutions. We considered the case where parameters are inferred by minimizing a cost function that equals the reconstruction error only (this corresponds to the case of very large values of the parameter α in the cost function). Figure 4—figure supplement 2 shows that after minimizing the reconstruction error, the cost function is flat in a large region of the order parameters. We also added Figure 5—figure supplement 5, to show that the dynamics of the feedforward network looks almost indistinguishable from the one of the recurrent network (Fig.5) - although the average canonical correlation coefficient is a bit lower for the purely feedforward case.

    Breaking the degeneracy of solutions. We added Figure 4—figure supplement 1 to show that for a wide range of the parameter α, all solutions cluster in a small region of parameter space. Solutions are found both above and below the bifurcation line. Note that all solutions are such that parameters jA and jB are close to the bifurcation line that separate the region where tuned network activity requires tuned external input, and the region where tuned network activity can be sustained autonomously. Furthermore, the weight of recurrent-connections within map B (j_B) is much stronger than the corresponding weight for map A (j_A). Hence, we observe that external inputs play a stronger role in shaping the dynamics during motor preparation than during execution, while recurrent inputs dominate the total inputs during movement execution, for a broad range of values of alpha. This prediction needs to be tested experimentally, although it is in line with the results of ref. 39, as we explain in the Discussion, section “Interplay between external and recurrent currents”, last paragraph.

    1. As an addition to the motor control literature, this paper's main strengths lie in the modelcapturing orthogonality between preparatory and movement-related activity subspaces (Elsayed et al 2016), which few models do. However, one might argue that the model is in fact half hand-crafted for this purpose, and half-tuned to neural data, in such a way that it is almost bound to exhibit the phenomenon. Thus, some form of broader model cross-validation would be nice: what else does the model capture about the data that did not explicitly inspire/determine its construction? As a starting point, I would suggest that the authors apply the type of CCA-based analysis originally performed by Sussillo et al (2015), and compare qualitatively to both Sussillo et al. (2015) and Kao et al (2021). Also, as every recorded monkey M1 neuron can be characterized by its coordinates in the 4-dimensional space of angular tuning, it should be straightforward to identify the closest model neuron; it would be very compelling to show side-by-side comparisons of single-neuron response timecourses in model and monkey (i.e., extend the comparison of Fig S6 to the temporal domain).

    We thank the reviewer for these suggestions. We have added the following comparisons:

    ● A CCA-based analysis (Fig 5.a) shows that the performance of our model is qualitatively comparable to the Sussillo et al. (2015) and Kao et al (2021) at generating realistic motor cortical activity (average canonical correlation ρ = 0.77 during movement preparation and 0.82 during movement execution).

    ● For each of the 141 neurons in the data, we selected the corresponding one in the model that is closest in the eta- and theta- parameters space:

    a) A side-by-side comparison of the time course of responses shows a good qualitative agreement (Fig 5.c).

    b) We successfully trained a linear decoder to read the responses of these 141 neurons from simulations and output trial-averaged EMG activity recorded from a monkey performing the same task Fig 5.b.

    c) Figure 5—figure supplement 4 shows that simulated data presents sequential activity, as does the recorded data.

    In our simulations, the temporal variability in single-neuron responses is due to the temporal evolution of the inferred external inputs, and to noise, implemented by an Ornstein-Uhlenbeck (OU) process that is added to the total inputs. Another source of variability could be introduced in the synaptic connectivity: one could add a gaussian random variable to each synaptic efficacy, for example. We checked that this simple extension of our model is able to reproduce the dynamics of the order parameters seen in the data. A full characterization of this extended model is beyond the scope of our paper.

    1. The paper's clarity could be improved.

    We thank the reviewer for his feedback. We have significantly rewritten most sections of the paper to improve clarity.

    Reviewer #2 (Public Review):

    The authors study M1 cortical recordings in two non-human primates performing straight delayed center-out reaches to one of 8 peripheral targets. They build a model for the data with the goal of investigating the interplay of inferred external inputs and recurrent synaptic connectivity and their contributions to the encoding of preferred movement direction during movement preparation and execution epochs. The model assumes neurons encode movement direction via a cosine tuning that can be different during preparation and execution epochs. As a result, each type of neuron in the model is described with four main properties: their preferred direction in the cosine tuning during preparation (denoted by θ_A) and execution (denoted by θ_B) epochs, and the strength of their encoding of the movement direction during the preparation (denoted by η_A) and execution (denoted by η_B) epochs. The authors assume that a recurrent network that can have different inputs during the preparation and execution epochs has generated the activity in the neurons. In the model, these inputs can both be internal to the network or external. The authors fit the model to real data by optimizing a loss that combines, via a hyperparameter α, the reconstruction of the cosine tunings with a cost to discourage/encourage the use of external inputs to explain the data. They study the solutions that would be obtained for various values of α. The authors conclude that during the preparatory epoch, external inputs seem to be more important for reproducing the neuron's cosine tunings to movement directions, whereas during movement execution external inputs seem to be untuned to movement direction, with the movement direction rather being encoded in the direction-specific recurrent connections in the network.

    Major:

    1. Fundamentally, without actually simultaneously recording the activity of upstream regions, it should not be possible to rule out that the seemingly recurrent connections in the M1 activity are actually due to external inputs to M1. I think it should be acknowledged in the discussion that inferred external inputs here are dependent on assumptions of the model and provide hypotheses to be validated in future experiments that actually record from upstream regions. To convey with an example why I think it is critical to simultaneously record from upstream regions to confirm these conclusions, consider two alternative scenarios: I) The recorded neurons in M1 have some recurrent connections that generate a pattern of activity that is based on the modeling seems to be recurrent. II) The exact same activity has been recorded from the same M1 neurons, but these neurons have absolutely no recurrent connections themselves, and are rather activated via purely feed-forward connections from some upstream region; that upstream region has recurrent connections and is generating the recurrent-like activity that is later echoed in M1. These two scenarios can produce the exact same M1 data, so they should not be distinguishable purely based on the M1 data. To distinguish them, one would need to simultaneously record from upstream regions to see if the same recurrent-like patterns that are seen in M1 were already generated in an upstream region or not. I think acknowledging this major limitation and discussing the need to eventually confirm the conclusions of this modeling study with actual simultaneous recordings from upstream regions is critical.

    We agree with the reviewer that it is not possible to rule out the hypothesis that motor cortical activity is purely generated by feedforward connectivity.

    In the new version of the paper, we discuss more explicitly the fact that neural activity can be fully explained by feedforward inputs, and we added Figure 5—figure supplement 5 to show that the dynamics of the feedforward network looks almost indistinguishable from the one of the recurrent network (Fig.5), provided their parameters are appropriately tuned. Notice, however, that a canonical correlation analysis comparing the activity from recording with the one from simulations shows that the average canonical correlation coefficient is slightly lower for the case of a purely feedforward network (Fig.5.a vs Fig.S12.a).

    A summary of our approach is:

    • We observe that both a purely feedforward and a recurrent network can reproduce the temporal course of the recordings equally well (see also our answer to question 5 below);

    • We point out that a solution that would save metabolic energy consumption is one where the activity is generated by recurrent currents (with shorter range local connections) rather than by feedforward inputs from upstream regions (long-range connections).

    • We study the solution that best reproduces the recorded activity and minimizes inputs from upstream regions.

    In the Discussion, we included the Reviewer’s observation that our hypothesis needs to be tested by simultaneous recordings of M1 and upstream regions, as well as measures of synaptic strength between motor cortical neurons. See the second paragraph of page 14: “ Our prediction (…) will be necessary to rule out alternative explanations”. Yet, we think that the results of reference [51] are consistent with our results.

    One last point we would like to stress is that external inputs drive the network's dynamics at all times, even in the solution that we argue would save metabolic energy consumption: untuned inputs are present throughout the whole course of the motor action, also during movement execution, and they determine the precise temporal pattern of neurons firing rates.

    1. The ring network model used in this work implicitly relies on the assumption that cosinetuning models are good representations of the recorded M1 neuronal activity. However, this assumption is not quantitatively validated in the data. Given that all conclusions depend on this, it would be important to provide some goodness of fit measure for the cosine tuning models to quantify how well the neurons' directional preferences are explained by cosine tunings. For example, reporting a histogram of the cosine tuning fit error over all neurons in Fig 2 would be helpful (currently example fits are shown only for a few neurons in Fig. 2 (a), (b), and Figure S6(b)). This would help quantitatively justify the modeling choice.

    We thank the reviewer for this observation. Fig.S2.e-f shows the R^2 coefficient of the cosine fit; in particular, we show that the R^2 of the cosine fit strongly correlates with the variables \eta, which represent the degree of participation of single units to the recurrent currents. Units with higher \eta (the ones that contribute more to the recurrent currents) are the ones whose tuning curves better resemble a cosine. However, the plot also shows that the R^2 coefficient of the cosine fit is pretty low for many cells. To show that a model with cosine tuning can yield this result, we repeated the same analysis on the units in our simulated network. In our simulations, all neurons receive a stochastic input mimicking large fluctuations around mean inputs that are expected to occur in vivo. We selected the 141 units whose activity more strongly resembled the activity of the 141 recorded neurons (see figure caption for details). We then looked at the tuning curves of these 141 units from simulations, and calculated the R^2 coefficient of the cosine fit. Figure 5—figure supplement 2.c shows that the result agrees well with the data: the R^2 coefficient is pretty low for many neurons, and correlates with the variable \eta. To summarize, a model that assumes cosine tuning, but also incorporates noise in the dynamics, reproduces well the R^2 coefficient of the cosine fit of tuning curves from data. We added the paragraph “Cosine tuning “ in the Discussion to comment on this point.

    1. The authors explain that the two-cylinder model that they use has "distinct but correlated"maps A and B during the preparation and movement. This is hard to see in the formulation. It would be helpful if the authors could expand in the Results on what they mean by "correlation" between the maps and which part of the model enforces the correlation.

    We thank the reviewer for this comment. By correlation, we meant the correlation between neural activity during the preparatory and movement-related temporal intervals. In the model, the correlation between the vectors θA and θB induces correlation in the preparatory and movement-related activity patterns. To make the paper easier to read, we are not mentioning this concept in the Results; in the Discussion, we explicitly refer to it in the following two paragraphs:

    “A strong correlation between the selectivity properties of the preparatory and movement-related epochs will produce strongly correlated patterns of activity in these two intervals and a strong overlap between the respective PCA subspaces.” (Discussion, section Orthogonal spaces dedicated to movement preparation and execution)

    “The correlation between the vectors θAand θB (Discussion, section Interplay between external and recurrent currents)”

    1. The authors note that a key innovation in the model formulation here is the addition ofparticipation strengths parameters (η_A, η_B) to prior two-cylinder models to represent the degree of neuron's participation in the encoding of the circular variable in either map. The authors state that this is critical for explaining the cosine tunings well: "We have discussed how the presence of this dimension is key to having tuning curves whose shape resembles the one computed from data, and decreases the level of orthogonality between the subspaces dedicated to the preparatory and movement-related activity". However, I am not sure where this is discussed. To me, it seems like to show that an additional parameter is necessary to explain the data well, one would need to compare fit to data between the model with that parameter and a model without that parameter. I don't think such a comparison was provided in the paper. It is important to show such a comparison to quantitatively show the benefit of the novel element of the model.

    We thank the reviewer for this comment.

    ● The key observation is that without the parameters eta_A, eta_B, the temporal evolution of all neurons in the network is the same (only the noise term added to the dynamics is different). To show this, we have performed a comparison of the temporal evolution of the firing rates of single neurons of the model with data. Fig 5.c shows a comparison between the time-course of single neurons firing rates from data and simulations (good agreement), while Figure 6—figure supplement 2.a shows the same comparison for a model in which all neurons have the same value of the eta_A, eta_B parameters (worse agreement: the range of firing rates is the same for all neurons). In summary, the parameters eta_A, eta_B introduce the variability in the coupling strengths that is necessary to generate heterogeneity in neuronal responses.

    ● At the end of section “PCA subspaces dedicated to movement preparation and execution”, we refer to (Figure 6—figure supplement 2).c, showing that a model with eta_A=1=eta_B for all neurons yields less orthogonal subspaces.

    1. The model parameters are fitted by minimizing a total cost that is a weighted average of twocosts as E_tot = α E_rec + E_ext, with the hyperparameter α determining how the two costs are combined. The selection of α is key in determining how much the model relies on external inputs to explain the cosine tunings in the data. As such, the conclusions of the paper rely on a clear justification of the selection of α and a clear discussion of its effect. Otherwise, all conclusions can be arbitrary confounds of this selection and thus unreliable. Most importantly, I think there should be a quantitative fit to data measure that is reported for different scenarios to allow comparison between them (also see comment 2). For example, when arguing that α should be "chosen so that the two terms have equal magnitude after minimization", this would be convincing if somehow that selection results in a better fit to the neural data compared with other values of α. If all such selections of α have a similar fit to neural data, then how can the authors argue that some are more appropriate than others? This is critical since small changes in alpha can lead to completely different conclusions (Fig. 6, see my next two comments).

    All the points raised in questions 5 to 8 are interrelated, and we address them below, after Major issue 8.

    1. The authors seem to select alpha based on the following: "The hyperparameter α was chosen so that the two terms have equal magnitude after minimization (see Fig. S4 for details)". Why is this the appropriate choice? The authors explain that this will lead to the behavior of the model being close to the "bifurcation surface". But why is that the appropriate choice? Does it result in a better fit to neural data compared with other choices of α? It is critical to clarify and justify as again all conclusions hinge on this choice.
    1. Fig 6 shows example solutions for 2 close values of α, and how even slight changes in the selection of α can change the conclusions. In Fig. 6 (d-e-f), α is chosen as the default approach such that the two terms E_rec and E_ext have equal magnitude. Here, as the authors note, during movement execution tuned external inputs are zero. In contrast, in Fig. 6 (g-h-i), α is chosen so that the E_rec term has a "slightly larger weight" than the E_ext term so that there is less penalty for using large external inputs. This leads to a different conclusion whereby "a small input tuned to θ_B is present during movement execution". Is one value of α a better fit to neural data? Otherwise, how do the authors justify key conclusions such as the following, which seems to be based on the first choice of α shown in Fig. 6 (d-e-f): "...observed patterns of covariance are shaped by external inputs that are tuned to neurons' preferred directions during movement preparation, and they are dominated by strong direction-specific recurrent connectivity during movement execution".
    1. It would be informative to see the extreme case of very large and very small α. For example, if α is very large such that external inputs are practically not penalized, would the model rely purely on external inputs (rather than recurrent inputs) to explain the tuning curves? This would be an example of the hypothetical scenario mentioned in my first comment. Would this result in a worse fit to neural data?

    We agree with the reviewer that it is crucial to discuss how the choice of the parameter alpha affects the results, and we have strived to improve this discussion in the revised manuscript.

    I. When we looked for the coupling parameters that best explain the data, without introducing a metabolic cost, we found multiple solutions that were equally good (see Figure 4—figure supplement 2 and our answer to question (1) above). These included the solution with all couplings set to zero ( j_s^B = j_s^A = j_a = 0), as well as many solutions with different values of synaptic couplings parameters. The solution with the strongest couplings is close to the bifurcation line, in the area where j_s^B > j_s^A.

    II. We then introduced a metabolic cost to break the degeneracy between these different solutions. The cost function we minimized contains two terms; their relative strength is modulated by alpha. The case of very small alpha (i.e., only minimizing external input) yields a very poor reconstruction of neural dynamics and is not interesting. The case of very large alpha reduces to the case (I) above. We added Figure 4—figure supplement 1 to show the results for intermediate values of alpha - alpha is large enough to yield a good reconstruction of neural dynamics, yet small enough to ensure that we find a unique solution. For these intermediate values of alpha, the two terms of the cost function have comparable magnitudes. Although slight changes in the selection of alpha do change whether the solutions are above or below the bifurcation surface, Figure 4—figure supplement 1 shows that all solutions are close to the bifurcation surface. In particular, the value of j_s^B is close to its critical value, while we never find solutions where j_s^A is close to its critical value - we never find solutions in the lower-right region of the plot in Figure 4—figure supplement 1. The critical value for j_s^B is the one above which no tuned external inputs are necessary to sustain the observed activity during movement execution. For values of j_s^B close to the bifurcation line but below it (for example, Fig.4g) inferred tuned inputs are still much weaker than the untuned ones, during movement execution. Also, the inferred direction-specific couplings are strong and amplify the weak external inputs tuned to map B, therefore still playing a major role in shaping the observed dynamics during movement execution.

    We have rewritten accordingly the abstract, introduction and conclusions of the paper. Instead of focusing on only one solution for a particular value of alpha, we now discuss all solutions and their implications.

    1. The authors argue in the discussion that "the addition of an external input strengthminimization constraint breaks the degeneracy of the space of solutions, leading to a solution where synaptic couplings depend on the tuning properties of the pre- and post-synaptic neurons, in such a way that in the absence of a tuned input, neural activity is localized in map B". In other words, the use of the E_ext term, apparently reduces "degeneracy" of the solution. This was not clear to me and I'm not sure where it is explained. This is also related to α because if alpha goes toward very large values, it would be like the E_ext term is removed, so it seems like the authors are saying that the solution becomes degenerate if alpha grows very large. This should be clarified.

    We thank the reviewer for pointing this out. By degeneracy of solution, we mean that the model can explain the data equally well for different choices of the recurrent couplings parameters (j_s^A, j_s^B, j_a). In other words, if we look for the coupling parameters that best explain the data, there are many equivalent solutions. When we introduce the E_ext term in the cost function, we then find one unique solution for each choice of alpha. So by “breaking the degeneracy”, we mean going from a scenario where there are many solutions that are equally valid, to one single solution. We added this explanation in the paper, along with the explanation on how our conclusion depends on the ‘choice of alpha’.

    1. How do the authors justify setting Φ_A = Φ_B in equation (5)? In other words, how is the last assumption in the following sentence justified: "To model the data, we assumed that the neurons are responding both to recurrent inputs and to fluctuating external inputs that can be either homogeneous or tuned to θ_A; θ_B, with a peak at constant location Φ_A = Φ_B ≡ Φ". Does this mean that the preferred direction for a given neuron is the same during preparation and movement epochs? If so, how is this consistent with the not-so-high correlation between the preferred directions of the two epochs shown in Fig. 2 c, which is reported to have a circular correlation coefficient of 0.4?

    We would like to stress the important distinction between the parameters \theta and the parameters Φ. While the parameters \theta_A and \theta_B represent the preferred direction of single neurons during preparatory and execution epochs, respectively, the parameters Φ_A, Φ_B represent the direction of motion that is encoded at the population level during these two epochs. The mean-field analysis shows that Φ_A = Φ_B, even though single neurons change their preferred direction from one epoch to the next. We added a more extensive explanation of the order parameters in the Results section.

    Reviewer #3 (Public Review):

    In this work, Bachschmid-Romano et al. propose a novel model of the motor cortex, in which the evolution of neural activity throughout movement preparation and execution is determined by the kinematic tuning of individual neurons. Using analytic methods and numerical simulations, the authors find that their networks share some of the features found in empirical neural data (e.g., orthogonal preparatory and execution-related activity). While the possibility of a simple connectivity rule that explains large features of empirical data is intriguing and would be highly relevant to the motor control field, I found it difficult to assess this work because of the modeling choices made by the authors and how the results were presented in the context of prior studies.

    Overall, it was not clear to me why Bachschmid-Romano et al. couched their models within a cosine-tuning framework and whether their results could apply more generally to more realistic models of the motor cortex. Under cosine-tuning models (or kinematic encoding models, more generally), the role of the motor cortex is to represent movement parameters so that they can presumably be read out by downstream structures. Within such a framework, the question of how the motor cortex maintains a stable representation of movement direction throughout movement preparation and execution when the tuning properties of individual neurons change dramatically between epochs is highly relevant. However, prior work has demonstrated that kinematic encoding models provide a poor fit for empirical data. Specifically, simple encoding models (and the more elaborate extensions [e.g., Inoue, et al., 2018]) cannot explain the complexity of single-neuron responses (Churchland and Shenoy, 2007), and do not readily produce the population-level signals observed in the motor cortex (Michaels, Dann, and Scherberger, 2016) and cannot be extended to more complex movements (Russo, et al., 2018).

    In both the Introduction and Discussion, the authors heavily cite an alternative to kinematic encoding models, the dynamical systems framework. Here, the correlations between kinematics and neural activity in the motor cortex are largely epiphenomenal. The motor cortex does not 'represent' anything; its role is to generate patterns of muscle activity. While the authors explicitly acknowledge the shortcomings of encoding models ('Extension to modeling richer movements', Discussion) and claim that their proposed model can be extended to 'more realistic scenarios', they neither demonstrate that their models can produce patterns of muscle activity nor that their model generates realistic patterns of neural activity. The authors should either fully characterize the activity in their networks and make the argument that their models better provide a better fit to empirical data than alternative models or demonstrate that more realistic computations can be explained by the proposed framework.

    Major Comments

    1. In the present manuscript, it is unclear whether the authors are arguing that representing movement direction is a critical computation that the motor cortex performs, and the proposed models are accurate models of the motor cortex, or if directional coding is being used as a 'proof of concept' that demonstrates how specific, population-level computations can be explained by the tuning of individual neurons.

    If the authors are arguing the former, then they need to demonstrate that their models generate activity similar to what is observed in the motor cortex (e.g., realistic PSTHs and population-level signals). Presently, the manuscript only shows tuning curves for six example neurons (Fig. S6) and a single jPC plane (Fig. S8). Regarding the latter, the authors should note that Michaels et al. (2016) demonstrated that representational models can produce rotations that are superficially similar to empirical data, yet are not dependent on maintaining an underlying condition structure (unlike the rotations observed in the motor cortex).

    If the authors are arguing the latter - and they seem to be, based on the final section of the Discussion - then they need to demonstrate that their proposed framework can be extended to what they call 'more realistic scenarios'. For example, could this framework be extended to a network that produces patterns of muscle activity?

    We thank the reviewer for raising these issues.

    Is our model a kinematic encoding model or a dynamical system?

    Our model is a dynamical system, as can be seen by inspecting equations (1,2). The main difference between our model and recently proposed dynamical system models of motor cortex is that the synaptic connectivity matrix in our model is built from the tuning properties of neurons, instead of being trained using supervised learning techniques (we come back to this important difference below). Since the network’s connectivity and external input depend on the neurons’ tuning to the direction of motion (eq 5-6), kinematic parameters emerge from the dynamic interaction between recurrent and feedforward currents, as specified by equations (1-6). Thus, kinematic parameters can be decoded from population activity.

    While in kinematic encoding models neurons’ firing rates are a function of parameters of the movement, we constrained the parameters of our model by requiring the model to reproduce the dynamics of a few order parameters, which are low-dimensional measures of the activity of recorded neurons. Our model is fitted to neural data, not to the parameters of the movement.

    Although we observed that a linear decoder of the network’s activity can reproduce patterns of muscle activity without decoding any kinematic parameter (see below), discussing whether tuning in M1 plays a computational role in controlling muscle activity is outside of the scope of our work. Rather, the scope of our paper is to discuss how a specific connectivity structure can generate the observed patterns of neural activity, and which connectivity structure requires minimum external inputs to sustain the dynamics. In our approach, the correlations between kinematics and neural activity in the motor cortex are not merely epiphenomenal, but emerge from a specific structure of the connectivity that has likely been shaped by hebbian-like learning mechanisms.

    Can the model generate realistic PSTHs and patterns of muscle activity? Yes, it can. As suggested, we have added the following comparisons:

    ● A CCA-based analysis (Fig 5.a) shows that the performance of our model is qualitatively comparable to the Sussillo et al. (2015) and Kao et al (2021) at generating realistic motor cortical activity (average canonical correlation ρ = 0.77 for motor preparation, 0.82 for motor execution).

    ● For each of the 141 neurons in the data, we selected the corresponding most similar unit in the model (the closest neurons in the eta- and theta- parameters space, i.e. the one with smallest euclidean distance in the space defined by (\theta_A, \theta_B, \eta_A, \eta_B)). A side-by-side comparison of the time course of responses (Fig 5.c) shows a good qualitative agreement.

    ● We successfully trained a linear decoder to read the responses of these 141 units from simulations and output trial-averaged EMG activity recorded from a monkey performing the same task (Fig 5.b).

    ● The model displays sequential activity and rotational dynamics (Fig. S10) without the need to introduce neuron-specific latencies (Michaels, Dann, and Scherberger, 2016).

    Can our model explain the complexity of single-neuron tuning?

    We have shown that our model captures the heterogeneity of neural responses. Yet, it has been shown that neurons’ tuning properties depend on many features of movement. For example, the current version of the model does not describe the dependence of tuning on speed (Churchland and Shenoy, 2007). However, our model could be extended to incorporate it. Preliminary results suggest that in a network model in which neurons differ by the degree of symmetry of their synaptic connectivity the speed of neural trajectories can be modulated by external inputs targeting preferentially neurons that are asymmetrically connected. In our model, all connections are a sum of a symmetric and an asymmetric term. We could extend our model to incorporate variability in the degree of symmetry in the connections, and speculate that in such a model tuning would depend on the speed of movement, for appropriate forms of external inputs. We leave this study to future work.

    Can our model explain neural activity underlying more complex trajectories? When limb trajectories are more complex than simple reaches (Russo, et al., 2018), a single neuron’s activity displays intricate response patterns. Our work could be extended to model more complex movement in several ways. A simplifying assumption we made is that the task can be clearly separated into a preparatory phase and one movement-related phase. A possible extension is one where the motor action is composed of a sequence of epochs, corresponding to a sequence of maps in our model. It will be interesting to study the role of asymmetric connections for storing a sequence of maps. Such a network model could be used to study the storing of motor motifs in the motor cortex (Logiaco et al, 2021); external inputs could then combine these building blocks to compose complex actions.

    In summary, we proposed a simple model that can explain recordings during a straight-reaching task. It provides a scaffold upon which we can build more sophisticated models to explain the activity underlying more complex tasks. We point out that a similar limitation is present in modeling approaches where a network is trained to perform specific neural or muscle activity. The question of whether/how trained recurrent networks can generalize is not yet solved, although currently under investigation (e.g., Dubreuil et al 2022; Driscoll et al 2022).

    What is the advantage of the present model, compared to an RNN trained to output specific neural/muscle activity?

    Its simplicity. Our model is a low-rank recurrent neural network: the structure of the connectivity matrix is simple enough to allow for analytical tractability of the dynamics. The model can be used to test specific hypotheses on the relationship between network connectivity, external inputs and neural dynamics, and to test hypotheses on the learning mechanisms that may lead to the emergence of a given connectivity structure. The model is also helpful to illustrate the problem of degeneracy of network models. An interesting future direction would be to compare the connectivity matrices of trained RNNs and our model.

    We addressed these points in the Discussion, in sections: “Representational vs dynamical system approaches” and “Extension to modeling activity underlying more complex tasks.”

    1. Related to the above point, the authors claim in the Abstract that their models 'recapitulatethe temporal evolution of single-unit activity', yet the only evidence they present is the tuning curves of six example units. Similarly, the authors should more fully characterize the population-level signals in their networks. The inferred inputs (Fig. 6) indeed seem reasonable, yet I'm not sure how surprising this result is. Weren't the authors guaranteed to infer a large, condition-invariant input during movement and condition-specific input during preparation simply because of the shape of the order parameters estimated from the data (Fig. 6c, thin traces)?

    We thank the reviewer for this comment. Regarding the first part of the question: we added new plots with more comparisons between the activity of our model and neural recordings (see the answer above referring to Fig 5).

    Regarding the second part: It is true that the shape of the latent variables that we measure from data constrains the solution that we find. However, a “condition-invariant input during movement and condition-specific input during preparation” is not the only scenario compatible with the data. Let’s take a step back and focus on the parameters that we are inferring from data. We are inferring both the strength of external inputs and the couplings parameters. This is done in a two-step inference procedure: we start from a random guess of the couplings parameters, then we infer the strength of the external inputs, and finally we compute the cost function, which depends on all parameters. This is done iteratively, by moving in the space of the coupling parameters; for each point in the space of the coupling parameters, there is one possible configuration of external inputs. The space of the coupling parameters is shown in Fig 4.a, for example (see also Fig. S4). The solutions that we find do not trivially follow from the shape of the latent variables. For example, one possible solution could be: large parameter j_s^A, small parameter j_s^B, which correspond to a point in the lower-right region of the parameter space in Fig 4.a (Fig. S4). The resulting external input would be a strong condition-specific external input during movement execution, but a condition-invariant input during movement preparation: the model is such that, for example, exciting for a short time-interval a few neurons whose preferred direction corresponds to the direction of motion would be enough to “set the direction of motion” for the network; the pattern of tuned activity could be sustained during the whole delay period thanks to the strong recurrent connections j_s^A. We could not rule out this solution by simply looking at the shape of the latent variables. However, it is a solution we have never observed. We only found solutions in the region where j_s^B is large and close to its critical value. This implies the presence of condition-specific inputs during the whole delay period, and condition-invariant external inputs that dominate over condition-specific ones during movement execution.

    1. In the Abstract and Discussion (first paragraph), the authors highlight that the preparatory andexecution-related spaces in the empirical data and their models are not completely orthogonal, suggesting that this near-orthogonality serves an important mechanistic purpose. However, networks have no problem transferring activity between completely orthogonal subspaces. For example, the generator model in Fig. 8 of Elsayed, et al. (2016) is constrained to use completely orthogonal preparatory and execution-related subspaces. As the authors point out in the Discussion, such a strategy only works because the motor cortex received a large input just before movement (Kaufman et al., 2016).

    We thank the reviewer for this observation. We would like to stress the fact that we are not claiming that having an overlap between subspaces is necessary to transfer activity. Instead, our model shows that a small overlap between the maps can be exploited by the network to transfer activity between subspaces without requiring direction-specific external inputs right before movement execution. A solution where activity is transferred through feedforward inputs is also possible. Indeed, one of the observations of our work (which we highlight more in the new version of the paper) is that by looking at motor cortical activity only, we are not able to distinguish between the activity generated by a feedforward network, and one generated by a recurrent one. However, we argue that a solution where external inputs are minimized can be favorable from a metabolic point of view, as it requires fewer signals to be transmitted through long-range connections. This informs our cost function, and yields a solution where activity is transferred through recurrent connections, by exploiting the small correlation between subspaces.

  2. eLife assessment

    The study provides a recurrent network model of M1 for center-out reaches. The "neurons" in the model show uncorrelated tuning for movement direction during preparation and execution with dynamic transition between the two states. The continuous attractor model provides an interesting example of flexible switching between neural representations.

  3. Reviewer #1 (Public Review):

    This article is aimed at constructing a recurrent network model of the population dynamics observed in the monkey primary motor cortex before and during reaching. The authors approach the problem from a representational viewpoint, by (i) focusing on a simple center-out reaching task where each reach is predominantly characterised by its direction, and (ii) using the machinery of continuous attractor models to construct network dynamics capable of holding stable representations of that angle. Importantly, M1 activity in this task exhibits a number of peculiarities that have pushed the authors to develop important methodological innovations which, to me, give the paper most of its appeal. In particular, M1 neurons have dramatically different tuning to reach direction in the movement preparation and execution epochs, and that fact motivated the introduction of a continuous attractor model incorporating (i) two distinct maps of direction selectivity and (ii) distinct degrees of participation of each neuron in each map. I anticipate that such models will become highly relevant as neuroscientists increasingly appreciate the highly heterogeneous, and stable-yet-non-stationary nature of neural representations in the sensory and cognitive domains.

    As far as modelling M1 is concerned, however, the paper could be considerably strengthened by a more thorough comparison between the proposed attractor model and the (few) other existing models of M1 (even if these comparisons are not favourable they will be informative nonetheless). For example, the model of Kao et al (2021) seems to capture all that the present model captures (orthogonality between preparatory and movement-related subspaces, rotational dynamics, tuned thalamic inputs mostly during preparation) but also does well at matching the temporal structure of single-neuron and population responses (shown e.g. through canonical correlation analysis). In particular, it is not clear to me how the symmetric structure of connectivity within each map would enable the production of temporally rich responses as observed in M1. If it doesn't, the model remains interesting, as feedforward connectivity between more than two maps (reflecting the encoding of many more kinematic variables) or other mechanisms (such as proprioceptive feedback) could well explain away the observed temporal complexity of neural responses. Investigating such alternative explanations would of course be beyond the scope of this paper, but it is arguably important for the readers to know where the model stands in the current literature.

    Below is a summary of my view on the main strengths and weaknesses of the paper:

    1. From a theoretical perspective, this is a great paper that makes an interesting use of the multi-map attractor model of Romani & Tsodyks (2010), motivated by the change in angular tuning configuration from the preparatory epoch to the movement execution epoch. Continuous attractor models of angular tuning are often criticised for being implausibly homogeneous/symmetrical; here, the authors address this limitation by incorporating an extra dimension to each map, namely the degree of participation of each neuron (the distribution of which is directly extracted from data). This extension of the classical ring model seems long overdue! Another nice thing is the direct use of data for constraining the model's coupling parameters; specifically, the authors adjust the model's parameters in such a way as to match the temporal evolution of a number of "order parameters" that are explicitly manifested (i.e. observable) in the population recordings.

    I believe the main weakness of this continuous attractor approach is that it - perhaps unduly - binarises the configuration of angular tuning. Specifically, it assumes that while angular tuning switches at movement onset, it is otherwise constant within each epoch (preparation and execution). I commend the authors for carefully motivating this in Figure 2 (2e in particular), by showing that the circular variance of the distribution of preferred directions is higher across prep & move than within either prep or move. While this justifies a binary "two-map model" to first order, the analysis nevertheless shows that preferred directions do change, especially within the preparatory epoch. Perhaps the authors could do some bootstrapping to assess whether the observed dispersion of PDs within sub-periods of the delay epoch is within the noise floor imposed by the finite number of trials used to estimate tuning curves. If it is, then this considerably strengthens the model; otherwise, the authors should say that the binarisation reflects an approximation made for analytical tractability, and discuss any important implications.

    2. While it is great to constrain the model parameters using the data, there is a glaring "issue" here which I believe is both a weakness and a strength of the approach. The model has a lot of freedom in the external inputs, which leads to relatively severe parameter degeneracies. The authors are entirely forthright about this: they even dedicate a whole section to explaining that depending on the way the cost function is set up, the fit can land the model in very different regimes, yielding very different conclusions. The problem is that I eventually could not decide what to make of the paper's main results about the inferred external inputs, and indeed what to make of the main claim of the abstract. It would be great if the authors could discuss these issues more thoroughly than they currently do, and in particular, argue more strongly about the reasons that might lead one to favour the solutions of Fig 6d/g over that of Fig 6a. On the other hand, I see the proposed model as an interesting playground that will probably enable a more thorough investigation of input degeneracies in RNN models. Several research groups are currently grappling with this; in particular, the authors of LFADS (Pandarinath et al, 2018) and other follow-up approaches (e.g. Schimel et al, 2022) make a big deal of being able to use data to simultaneously learn the dynamics of a neural circuit and infer any external inputs that drive those dynamics, but everyone knows that this is a generally ill-posed problem (see also discussion in Malonis et al 2021, which the authors cite). As far as I know, it is not yet clear what form of regularisation/prior might best improve identifiability. While Bachschmid-Romano et al. do not go very far in dissecting this problem, the model they propose is low-dimensional and more amenable to analytical calculations, such that it provided a valuable playground for future work on this topic.

    3. As an addition to the motor control literature, this paper's main strengths lie in the model capturing orthogonality between preparatory and movement-related activity subspaces (Elsayed et al 2016), which few models do. However, one might argue that the model is in fact half hand-crafted for this purpose, and half-tuned to neural data, in such a way that it is almost bound to exhibit the phenomenon. Thus, some form of broader model cross-validation would be nice: what else does the model capture about the data that did not explicitly inspire/determine its construction? As a starting point, I would suggest that the authors apply the type of CCA-based analysis originally performed by Sussillo et al (2015), and compare qualitatively to both Sussillo et al. (2015) and Kao et al (2021). Also, as every recorded monkey M1 neuron can be characterized by its coordinates in the 4-dimensional space of angular tuning, it should be straightforward to identify the closest model neuron; it would be very compelling to show side-by-side comparisons of single-neuron response timecourses in model and monkey (i.e., extend the comparison of Fig S6 to the temporal domain).

    4. The paper's clarity could be improved.

  4. Reviewer #2 (Public Review):

    The authors study M1 cortical recordings in two non-human primates performing straight delayed center-out reaches to one of 8 peripheral targets. They build a model for the data with the goal of investigating the interplay of inferred external inputs and recurrent synaptic connectivity and their contributions to the encoding of preferred movement direction during movement preparation and execution epochs. The model assumes neurons encode movement direction via a cosine tuning that can be different during preparation and execution epochs. As a result, each type of neuron in the model is described with four main properties: their preferred direction in the cosine tuning during preparation (denoted by θ_A) and execution (denoted by θ_B) epochs, and the strength of their encoding of the movement direction during the preparation (denoted by η_A) and execution (denoted by η_B) epochs. The authors assume that a recurrent network that can have different inputs during the preparation and execution epochs has generated the activity in the neurons. In the model, these inputs can both be internal to the network or external. The authors fit the model to real data by optimizing a loss that combines, via a hyperparameter α, the reconstruction of the cosine tunings with a cost to discourage/encourage the use of external inputs to explain the data. They study the solutions that would be obtained for various values of α. The authors conclude that during the preparatory epoch, external inputs seem to be more important for reproducing the neuron's cosine tunings to movement directions, whereas during movement execution external inputs seem to be untuned to movement direction, with the movement direction rather being encoded in the direction-specific recurrent connections in the network.

    Major:

    1. Fundamentally, without actually simultaneously recording the activity of upstream regions, it should not be possible to rule out that the seemingly recurrent connections in the M1 activity are actually due to external inputs to M1. I think it should be acknowledged in the discussion that inferred external inputs here are dependent on assumptions of the model and provide hypotheses to be validated in future experiments that actually record from upstream regions. To convey with an example why I think it is critical to simultaneously record from upstream regions to confirm these conclusions, consider two alternative scenarios: I) The recorded neurons in M1 have some recurrent connections that generate a pattern of activity that is based on the modeling seems to be recurrent. II) The exact same activity has been recorded from the same M1 neurons, but these neurons have absolutely no recurrent connections themselves, and are rather activated via purely feed-forward connections from some upstream region; that upstream region has recurrent connections and is generating the recurrent-like activity that is later echoed in M1. These two scenarios can produce the exact same M1 data, so they should not be distinguishable purely based on the M1 data. To distinguish them, one would need to simultaneously record from upstream regions to see if the same recurrent-like patterns that are seen in M1 were already generated in an upstream region or not. I think acknowledging this major limitation and discussing the need to eventually confirm the conclusions of this modeling study with actual simultaneous recordings from upstream regions is critical.

    2. The ring network model used in this work implicitly relies on the assumption that cosine tuning models are good representations of the recorded M1 neuronal activity. However, this assumption is not quantitatively validated in the data. Given that all conclusions depend on this, it would be important to provide some goodness of fit measure for the cosine tuning models to quantify how well the neurons' directional preferences are explained by cosine tunings. For example, reporting a histogram of the cosine tuning fit error over all neurons in Fig 2 would be helpful (currently example fits are shown only for a few neurons in Fig. 2 (a), (b), and Figure S6(b)). This would help quantitatively justify the modeling choice.

    3. The authors explain that the two-cylinder model that they use has "distinct but correlated" maps A and B during the preparation and movement. This is hard to see in the formulation. It would be helpful if the authors could expand in the Results on what they mean by "correlation" between the maps and which part of the model enforces the correlation.

    4. The authors note that a key innovation in the model formulation here is the addition of participation strengths parameters (η_A, η_B) to prior two-cylinder models to represent the degree of neuron's participation in the encoding of the circular variable in either map. The authors state that this is critical for explaining the cosine tunings well: "We have discussed how the presence of this dimension is key to having tuning curves whose shape resembles the one computed from data, and decreases the level of orthogonality between the subspaces dedicated to the preparatory and movement-related activity". However, I am not sure where this is discussed. To me, it seems like to show that an additional parameter is necessary to explain the data well, one would need to compare fit to data between the model with that parameter and a model without that parameter. I don't think such a comparison was provided in the paper. It is important to show such a comparison to quantitatively show the benefit of the novel element of the model.

    5. The model parameters are fitted by minimizing a total cost that is a weighted average of two costs as E_tot = α E_rec + E_ext, with the hyperparameter α determining how the two costs are combined. The selection of α is key in determining how much the model relies on external inputs to explain the cosine tunings in the data. As such, the conclusions of the paper rely on a clear justification of the selection of α and a clear discussion of its effect. Otherwise, all conclusions can be arbitrary confounds of this selection and thus unreliable. Most importantly, I think there should be a quantitative fit to data measure that is reported for different scenarios to allow comparison between them (also see comment 2). For example, when arguing that α should be "chosen so that the two terms have equal magnitude after minimization", this would be convincing if somehow that selection results in a better fit to the neural data compared with other values of α. If all such selections of α have a similar fit to neural data, then how can the authors argue that some are more appropriate than others? This is critical since small changes in alpha can lead to completely different conclusions (Fig. 6, see my next two comments).

    6. The authors seem to select alpha based on the following: "The hyperparameter α was chosen so that the two terms have equal magnitude after minimization (see Fig. S4 for details)". Why is this the appropriate choice? The authors explain that this will lead to the behavior of the model being close to the "bifurcation surface". But why is that the appropriate choice? Does it result in a better fit to neural data compared with other choices of α? It is critical to clarify and justify as again all conclusions hinge on this choice.

    7. Fig 6 shows example solutions for 2 close values of α, and how even slight changes in the selection of α can change the conclusions. In Fig. 6 (d-e-f), α is chosen as the default approach such that the two terms E_rec and E_ext have equal magnitude. Here, as the authors note, during movement execution tuned external inputs are zero. In contrast, in Fig. 6 (g-h-i), α is chosen so that the E_rec term has a "slightly larger weight" than the E_ext term so that there is less penalty for using large external inputs. This leads to a different conclusion whereby "a small input tuned to θ_B is present during movement execution". Is one value of α a better fit to neural data? Otherwise, how do the authors justify key conclusions such as the following, which seems to be based on the first choice of α shown in Fig. 6 (d-e-f): "...observed patterns of covariance are shaped by external inputs that are tuned to neurons' preferred directions during movement preparation, and they are dominated by strong direction-specific recurrent connectivity during movement execution".

    8. It would be informative to see the extreme case of very large and very small α. For example, if α is very large such that external inputs are practically not penalized, would the model rely purely on external inputs (rather than recurrent inputs) to explain the tuning curves? This would be an example of the hypothetical scenario mentioned in my first comment. Would this result in a worse fit to neural data?

    9. The authors argue in the discussion that "the addition of an external input strength minimization constraint breaks the degeneracy of the space of solutions, leading to a solution where synaptic couplings depend on the tuning properties of the pre- and post-synaptic neurons, in such a way that in the absence of a tuned input, neural activity is localized in map B". In other words, the use of the E_ext term, apparently reduces "degeneracy" of the solution. This was not clear to me and I'm not sure where it is explained. This is also related to α because if alpha goes toward very large values, it would be like the E_ext term is removed, so it seems like the authors are saying that the solution becomes degenerate if alpha grows very large. This should be clarified.

    10. How do the authors justify setting Φ_A = Φ_B in equation (5)? In other words, how is the last assumption in the following sentence justified: "To model the data, we assumed that the neurons are responding both to recurrent inputs and to fluctuating external inputs that can be either homogeneous or tuned to θ_A; θ_B, with a peak at constant location Φ_A = Φ_B ≡ Φ". Does this mean that the preferred direction for a given neuron is the same during preparation and movement epochs? If so, how is this consistent with the not-so-high correlation between the preferred directions of the two epochs shown in Fig. 2 c, which is reported to have a circular correlation coefficient of 0.4?

  5. Reviewer #3 (Public Review):

    In this work, Bachschmid-Romano et al. propose a novel model of the motor cortex, in which the evolution of neural activity throughout movement preparation and execution is determined by the kinematic tuning of individual neurons. Using analytic methods and numerical simulations, the authors find that their networks share some of the features found in empirical neural data (e.g., orthogonal preparatory and execution-related activity). While the possibility of a simple connectivity rule that explains large features of empirical data is intriguing and would be highly relevant to the motor control field, I found it difficult to assess this work because of the modeling choices made by the authors and how the results were presented in the context of prior studies.

    Overall, it was not clear to me why Bachschmid-Romano et al. couched their models within a cosine-tuning framework and whether their results could apply more generally to more realistic models of the motor cortex. Under cosine-tuning models (or kinematic encoding models, more generally), the role of the motor cortex is to represent movement parameters so that they can presumably be read out by downstream structures. Within such a framework, the question of how the motor cortex maintains a stable representation of movement direction throughout movement preparation and execution when the tuning properties of individual neurons change dramatically between epochs is highly relevant. However, prior work has demonstrated that kinematic encoding models provide a poor fit for empirical data. Specifically, simple encoding models (and the more elaborate extensions [e.g., Inoue, et al., 2018]) cannot explain the complexity of single-neuron responses (Churchland and Shenoy, 2007), and do not readily produce the population-level signals observed in the motor cortex (Michaels, Dann, and Scherberger, 2016) and cannot be extended to more complex movements (Russo, et al., 2018).

    In both the Introduction and Discussion, the authors heavily cite an alternative to kinematic encoding models, the dynamical systems framework. Here, the correlations between kinematics and neural activity in the motor cortex are largely epiphenomenal. The motor cortex does not 'represent' anything; its role is to generate patterns of muscle activity. While the authors explicitly acknowledge the shortcomings of encoding models ('Extension to modeling richer movements', Discussion) and claim that their proposed model can be extended to 'more realistic scenarios', they neither demonstrate that their models can produce patterns of muscle activity nor that their model generates realistic patterns of neural activity. The authors should either fully characterize the activity in their networks and make the argument that their models better provide a better fit to empirical data than alternative models or demonstrate that more realistic computations can be explained by the proposed framework.

    Major Comments
    1. In the present manuscript, it is unclear whether the authors are arguing that representing movement direction is a critical computation that the motor cortex performs, and the proposed models are accurate models of the motor cortex, or if directional coding is being used as a 'proof of concept' that demonstrates how specific, population-level computations can be explained by the tuning of individual neurons.
    If the authors are arguing the former, then they need to demonstrate that their models generate activity similar to what is observed in the motor cortex (e.g., realistic PSTHs and population-level signals). Presently, the manuscript only shows tuning curves for six example neurons (Fig. S6) and a single jPC plane (Fig. S8). Regarding the latter, the authors should note that Michaels et al. (2016) demonstrated that representational models can produce rotations that are superficially similar to empirical data, yet are not dependent on maintaining an underlying condition structure (unlike the rotations observed in the motor cortex).
    If the authors are arguing the latter - and they seem to be, based on the final section of the Discussion - then they need to demonstrate that their proposed framework can be extended to what they call 'more realistic scenarios'. For example, could this framework be extended to a network that produces patterns of muscle activity?

    2. Related to the above point, the authors claim in the Abstract that their models 'recapitulate the temporal evolution of single-unit activity', yet the only evidence they present is the tuning curves of six example units. Similarly, the authors should more fully characterize the population-level signals in their networks. The inferred inputs (Fig. 6) indeed seem reasonable, yet I'm not sure how surprising this result is. Weren't the authors guaranteed to infer a large, condition-invariant input during movement and condition-specific input during preparation simply because of the shape of the order parameters estimated from the data (Fig. 6c, thin traces)?

    3. In the Abstract and Discussion (first paragraph), the authors highlight that the preparatory and execution-related spaces in the empirical data and their models are not completely orthogonal, suggesting that this near-orthogonality serves an important mechanistic purpose. However, networks have no problem transferring activity between completely orthogonal subspaces. For example, the generator model in Fig. 8 of Elsayed, et al. (2016) is constrained to use completely orthogonal preparatory and execution-related subspaces. As the authors point out in the Discussion, such a strategy only works because the motor cortex received a large input just before movement (Kaufman et al., 2016).