BrainPy, a flexible, integrative, efficient, and extensible framework for general-purpose brain dynamics programming

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    The paper introduces a new, important framework for neural modelling that promises to offer efficient simulation and analysis tools for a wide range of biologically-realistic neural networks. The paper's examples provide solid support for the ease of use and flexibility of the framework, but the comparison to existing solutions (in particular in terms of accuracy and performance) is incomplete. With a more careful evaluation of the tool's strengths and limitations, the work would be of interest to a wide range of computational neuroscientists and researchers working on biologically inspired machine learning applications.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Elucidating the intricate neural mechanisms underlying brain functions requires integrative brain dynamics modeling. To facilitate this process, it is crucial to develop a general-purpose programming framework that allows users to freely define neural models across multiple scales, efficiently simulate, train, and analyze model dynamics, and conveniently incorporate new modeling approaches. In response to this need, we present BrainPy. BrainPy leverages the advanced just-in-time (JIT) compilation capabilities of JAX and XLA to provide a powerful infrastructure tailored for brain dynamics programming. It offers an integrated platform for building, simulating, training, and analyzing brain dynamics models. Models defined in BrainPy can be JIT compiled into binary instructions for various devices, including Central Processing Unit, Graphics Processing Unit, and Tensor Processing Unit, which ensures high-running performance comparable to native C or CUDA. Additionally, BrainPy features an extensible architecture that allows for easy expansion of new infrastructure, utilities, and machine-learning approaches. This flexibility enables researchers to incorporate cutting-edge techniques and adapt the framework to their specific needs.

Article activity feed

  1. Author Response

    Reviewer #1

    While the article clearly outlines the strengths of the chosen approach, it lacks an equally clear exposition of its limitations and a more thorough comparison to established approaches. Two examples of limitations that should be stated more clearly, in my opinion: models need to be small enough to fit on a single machine (in contrast to e.g. NEURON and NEST which support distributed computation via MPI), and only single-compartment models are supported; both limitations are mentioned in passing in the discussion, but would merit a more upfront mention.

    We agree that our paper could be improved by more clearly stating the limitations of our approach and comparing it to established approaches. We have revised the paper and added two new subsections in the Discussion section to address these specific concerns:

    1. The Limitations subsection (L448 - L491) acknowledges restrictions of BrainPy paradigm which uses a Python-based object-oriented programming. It highlights three main categories of limitations: (a) approach limitations, (b) functionality limitations, (c) parallelization limitations. These limitations highlight areas where BrainPy may require further development to improve its functionality, performance, and compatibility with different modeling approaches.

    2. The Future Work subsection (L493 - L526) outlines development enhancements we aim to pursue in the near term. It emphasizes the need for further development in order to meet the demands of the field. Three key areas requiring attention are highlighted: (a) multi-compartment neuron models, (b) ultra-large-scale brain simulations, (c) bridging with acceleration computing systems.

    In addition to these changes, we have also made a number of other minor changes to the paper to improve its clarity and readability.

    The study does not verify the accuracy of the presented framework. While its basic approach (time-step-based simulation, standard numerical integration algorithms) is sufficiently similar to other software to not expect major discrepancies, an explicit comparison would remove any doubt. Quantitative measures of accuracies are particularly important in the context of benchmarks (see below), since simulations can be made arbitrarily fast by sacrificing performance.

    We agree that an explicit comparison would help alleviate any doubts and provide a more comprehensive understanding of our framework's accuracy. We have revised our manuscript to include a dedicated section, particularly Appendix 11. In this section, we verified that all simulators generated consistent average firing rates for the given benchmark network models (figure 1 and figure 2 in Appendix 11). These verifications were performed under different network sizes (ranging from 4e^3 to 4e^5) and different computing platforms (CPU, GPU and TPU). We also qualitatively compared the overall network activity patterns produced by each simulator to ensure they exhibited the same dynamics (figure 3 and figure 4 in Appendix 11). While exact spike-to-spike reproducibility was not guaranteed between different simulator implementations, we confirmed that our simulator produced activity consistent with the reference simulators for both firing rates and network-level dynamics. Additionally, BrainPy did not sacrifice simulation accuracy for speed performance. Despite using single precision floating point, BrainPy was able to produce consistent firing rates and raster diagrams across all simulations (see figure 3 and figure 4 in Appendix 11).

    We hope these revisions can ensure that our manuscript provides a clear and robust validation of the accuracy of our simulator.

    Benchmarking against other software is obviously important, but also full of potential pitfalls. The current article does not state clearly whether the results are strictly comparable. In particular: are the benchmarks on the different simulators calculating results to the same accuracy (use of single or double precision, same integration algorithm, etc.)? Does each simulator use the fastest possible execution mode (e.g. number of threads/processes for NEST, C++ standalone mode in Brian2, etc.)? What is exactly measured (compilation time, network generation time, simulation execution time, ...) - these components will scale differently with network size and simulation duration, so summing them up makes the results difficult to interpret. Details are also missing for the comparison between the XLA operator customization in C++ vs. Python: was the C++ variant written by the authors or by someone else? Does the NUMBA→XLA mechanism also support GPUs/TPUs? This comparison also seems to be missing from the GitHub repository provided for reproducing the paper results.

    We have carefully considered these comments and addressed each of these concerns regarding the benchmarks and examples presented in our paper.

    1. Lack of Details in Examples: In the revised version of the paper, we provide additional information and any other pertinent details to enhance the clarity and replicability of our results. Particularly, in Appendix 9, we provide the mathematical description, the number of neurons, the connection density, and delay times used in our multi-scale spiking network; in Appendix 10, we provide the detail description of reservoir models, evaluation metrics, training algorithms, and their implementations in BrainPy; in Appendix 11, we elaborate the hardware and software specifications and experimental details for benchmark comparisons.

    2. Inadequate Description of Benchmarking Procedures: In the revised paper, particularly, in L328-L329 of the main text at section of "Efficient performance of BrainPy" and in Appendix 11, we elaborate on the integration methods, simulation time steps, and floating-point precision used in our experiments. We also ensure that these parameters are clearly stated and identical across all simulators involved in the benchmarking process, see "Accuracy evaluations" in Appendix 11 (L1550 - L1580).

    3. Clarification on Measured Time: In the revised paper, we state that all simulations only measured the model execution time, excluding model construction time, synapse creation time, and compilation time, see "Performance measurements" in Appendix 11 (L1539 - L1548).

    4. Consideration of Acceleration Modes: In the revised version, we provide the simulation speed of other brain simulators on different acceleration modes, see Figure 8. For instance, we utilize the fastest possible option --- the C++ standalone mode in Brian2 --- for speed evaluations. Furthermore, we have requested the developers of the comparison simulators for optimizing the benchmark models, ensuring a fair and accurate comparison.

    5. Scaling Networks to Maintain Activity: In the revised manuscript, we adopt the suggestion of Reviewer #3 and apply the appropriate scaling techniques to maintain consistent network activity throughout our experiments. These details can be found in Appendix 11 (also see Appendix 11—figure 1 and Appendix 11—figure 2).

    Regarding the comparison between XLA operator customization in C++ and Python, we utilized our self-implemented C++ version, which is accessible in the Appendix 8 Listing 2. Presently, the NUMBA→XLA mechanism does not support GPUs/TPUs; however, we are working on expanding this capability to other platforms. We have made this clarification in the revised manuscript as well (see L1278 - L1285).

    While the authors convincingly argue for the merits of their Python-based/object-oriented approach, in my opinion, they do not fully acknowledge the advantages of domain-specific languages (NMODL, NestML, equation syntax of ANNarchy and Brian2, ...). In particular, such languages aim at a strong decoupling of the mathematical model description from its implementation and other parts of the model. In contrast, models described with BrainPy's approach often need to refer to such details, e.g. be aware of differences between dense and sparse connectivity schemes, online, or batch mode, etc. It might also be worth mentioning descriptive approaches to synaptic connectivity as supported by other simulators (connection syntax in Brian2, Connection Set Algebra for NEST).

    We have made revisions to better acknowledge the merits of DSLs while providing a more comprehensive comparison. These revisions are incorporated in Discussion (L452 - L466) and Appendix 1 (L778 - L788).

    Reviewer #2

    While the results presented are impressive, publishing further details of the benchmarks in an appendix would be helpful for evaluating the claims and the overall conclusion would be more convincing if the performance benefits were demonstrated on a wider selection of test cases. Unsatisfyingly, the authors gave up on making a direct comparison to Brian running on GPUs with GeNN which would have been a fairer comparison than CPU-based simulations. The code for the chosen benchmarks is also likely to be highly optimised by the authors for running on BrainPy but less so for the other platforms - a fairer test would be to invite the authors of the other simulators to optimise the same models and re-evaluate the benchmarks.

    We have carefully considered these comments and addressed each of these concerns regarding the benchmarks and examples presented in our paper.

    1. Lack of Details in Examples: In the revised version of the paper, we provide additional information and any other pertinent details to enhance the clarity and replicability of our results. Particularly, in Appendix 9, we provide the mathematical description, the number of neurons, the connection density, and delay times used in our multi-scale spiking network; in Appendix 10, we provide the detail description of reservoir models, evaluation metrics, training algorithms, and their implementations in BrainPy; in Appendix 11, we elaborate the hardware and software specifications and experimental details for benchmark comparisons.

    2. Inadequate Description of Benchmarking Procedures: In the revised paper, particularly, in L328-L329 of the main text at section of "Efficient performance of BrainPy" and in Appendix 11, we elaborate on the integration methods, simulation time steps, and floating-point precision used in our experiments. We also ensure that these parameters are clearly stated and identical across all simulators involved in the benchmarking process, see "Accuracy evaluations" in Appendix 11 (L1550 - L1580).

    3. Clarification on Measured Time: In the revised paper, we state that all simulations only measured the model execution time, excluding model construction time, synapse creation time, and compilation time, see "Performance measurements" in Appendix 11 (L1539 - L1548).

    4. Consideration of Acceleration Modes: In the revised version, we provide the simulation speed of other brain simulators on different acceleration modes, see Figure 8. For instance, we utilize the fastest possible option --- the C++ standalone mode in Brian2 --- for speed evaluations. Furthermore, we have requested the developers of the comparison simulators for optimizing the benchmark models, ensuring a fair and accurate comparison.

    5. Scaling Networks to Maintain Activity: In the revised manuscript, we adopt the suggestion of Reviewer #3 and apply the appropriate scaling techniques to maintain consistent network activity throughout our experiments. These details can be found in Appendix 11 (also see Appendix 11—figure 1 and Appendix 11—figure 2).

    Regarding the wider selection of test cases, we understand the importance of demonstrating the performance benefits on a broader range of scenarios. Particularly, we have designed two kinds of benchmark models:

    • Sparse connection models. This category models include COBA-LIF network and COBA-HH network. The former is a standard E/I balanced network for comparing simualtion speed of a brain simulator, while the latter uses the complex computational expensive HH neuron model as the elements. Both models can be effectively to demonstrate the capability of a brain simulator for the sparse and event-driven computation.

    • Dense connection models. The local circuits of a cortical column are usually connected densely (Science 366, 1093). Particularly, we use the decision making network proposed by (Wang, 2002) for evaluations.

    In the revised version, we include extensive experiments on these three test cases under different kinds of computing platforms (including CPU, GPU, and TPU) to strengthen the overall conclusion and provide a more comprehensive evaluation of our approach.

    Regarding the comparison to Brian running on GPUs with GeNN, we apologize for not including that in our initial submission. We have conducted the necessary experiments on all three benchmark models we have used in our evaluations and include these results in the revised version of the paper (see Figure 8). This addition will enhance the credibility of our findings and allow for a more meaningful comparison between different simulation platforms. Furthermore, we have also reached out to the authors of other simulators and invite them to optimize the same models used in our benchmarks. We believe this collaborative approach will ensure a more equitable evaluation of the simulators and provide a more robust and convincing analysis of our work.

    Furthermore, the manuscript reads like an advertisement for the platform with very little discussion of its limitations, weaknesses, or directions for further improvement. A more frank and balanced perspective would strengthen the manuscript and give the reader greater confidence in the platform.

    We agree that our paper could be improved by more clearly stating the limitations of our approach and comparing it to established approaches. We have revised the paper and added two new subsections in the Discussion section to address these specific concerns:

    1. The Limitations subsection (L448 - L491) acknowledges restrictions of BrainPy paradigm which uses a Python-based object-oriented programming. It highlights three main categories of limitations: (a) approach limitations, (b) functionality limitations, (c) parallelization limitations. These limitations highlight areas where BrainPy may require further development to improve its functionality, performance, and compatibility with different modeling approaches.

    2. The Future Work subsection (L493 - L526) outlines development enhancements we aim to pursue in the near term. It emphasizes the need for further development in order to meet the demands of the field. Three key areas requiring attention are highlighted: (a) multi-compartment neuron models, (b) ultra-large-scale brain simulations, (c) bridging with acceleration computing systems. In addition to these changes, we have also made a number of other minor changes to the paper to improve its clarity and readability.

    Since simulators wax and wane in popularity, it would be reassuring to see a roadmap for development with a proposed release cadence and a sustainable governance policy for the project. This would serve to both clearly indicate the areas of active development where community contributions would be most valuable and also to reassure potential users that the project is unlikely to be abandoned in the near future, ensuring that their time investment in learning to use the framework will not be wasted.

    We appreciate the reviewer raising the point for demonstrating the project's sustainability. In response to this feedback, we have made the following efforts.

    Firstly, we add and maintain a "Development roadmap" section in the BrainPy GitHub homepage (https://github.com/brainpy/BrainPy). This will enable the community to have a clear understanding of the project's direction and the areas of active development. Additionally, the "Future work" section in our revised paper has also outlined a comprehensive roadmap for next stages of the BrainPy development.

    Secondly, to address the concern about the sustainability of our project and the potential risk of abandonment, we have incorporated a ACKNOWLEDGMENTS.md file in the GitHub (https://github.com/brainpy/BrainPy/blob/master/ACKNOWLEDGMENTS.md) to outline our sustainable funding support. These supports demonstrates our commitment to the long-term maintenance and development of the project, thus may help to dispel doubts of users for the project abandonment.

    Similarly, a complex set of dependencies, which need to be modified for BrainPy, will likely make the project hard to maintain and so a similar plan to those given for the CI pipeline and documentation generation for automation of these modifications would be a good addition. It is also important to periodically reflect on whether it still makes sense to combine all the disparate tools into one framework as the codebase grows and starts to strain under modifications required to maintain its unification.

    We appreciate the reviewer's valuable suggestions on the BrainPy framework.

    First, BrainPy is a self-contained package designed specifically for brain dynamics programming. It boasts minimal dependencies, relying only on fundamental packages within the Python scientific computing ecosystem. In essence, BrainPy relies on numpy for array-based computations and utilizes jax and jaxlib for JIT compilation. While we currently utilize numba to customize dedicated operators, we can also remove this dependency by rewriting these operators with C++ code. We incorporate the use of brainpylib, a package developed by ourselves, which provides dedicated operators for CPUs and GPUs in the context of brain dynamics modeling. Additionally, BrainPy leverages mature solutions within the field for certain auxiliary functions. For instance, we integrate the use of tqdm to facilitate the display of a progress bar during model execution, and employ matplotlib for visualization purposes, capitalizing on its well-established capabilities in the scientific community.

    Second, we agree that there is a risk of overly complex dependencies and architectural strains. To mitigate this risk, we have taken the following changes:

    • We prioritize good software engineering practices like loose coupling, high cohesion and modularity in the framework design. This will isolate dependencies and changes to specific components. For example, brainpy.visualize nodule defines abstract visualization functions in which the visualization backend can be changed anytime.

    • We invest in automating aspects of the build, test, and release process to relieve manual maintenance burdens. We heavily use the GitHub actions for testing BrainPy codes and building documentations.

    • We document dependencies clearly and maintain backwards compatibility when possible. New APIs will be clearly stated supported after which BrainPy version, and deprecated APIs will be deprecated over multiple release cycles.

    • We continuously monitor code complexity metrics and refactor/simplify the architecture when needed.

    • When new tools have significantly different requirements, we will consider spinning them off into separate projects rather than forcing them into the core framework.

    Finally, a live demonstration would be a very useful addition to the project. For example, a Jupyter notebook hosted on mybinder.org or similar, and a fully configured Docker image, would each enable potential users to quickly experiment with BrainPy without having to install a stack of dependencies and troubleshoot version conflicts with their pre-existing setup. This would greatly lower the barrier to adoption and help to convince a larger base of modellers of the potential merits of BrainPy, which could be major, both in terms of the computational speed-up and ease of development for a wide range of modelling paradigms.

    We appreciate the reviewer's valuable feedback and suggestion. We have hosted a Jupyter notebook and a fully configured Docker image on mybinder.org (https://mybinder.org/v2/gh/brainpy/BrainPy-binder/main). Users can easily experiment with BrainPy without the need to install multiple dependencies or troubleshoot version conflicts.

    Reviewer #3

    One potential issue is that the scope of the neuro-simulator is not very clearly explained and the target audience is not well defined: is BrainPy primarily intended for computational neuroscientists or for neuro-AI practitioners? The simulator offers very detailed neural models (HH, fractional order models), classical point-models (LIF, AdEx), rate-coded models (reservoirs), but also deep learning layers (Conv, MaxPool, BatchNorm, LSTM). Is there an advantage to using BrainPy rather than PyTorch for purely deep networks? Is it possible to build hybrid models combining rate-coded reservoirs or convnets with a network of HH neurons? Without such a hybrid approach, it is unclear why the deep learning layers are needed.

    We appreciate the reviewer's concern regarding the scope of BrainPy and the need for clarification regarding the target audience.

    BrainPy is designed to cater to both computational neuroscientists and neuro-AI practitioners by integrating detailed neural models, classical point models, rate-coded models, and deep learning models. The platform aims to provide a general-purpose programming framework for modeling brain dynamics, allowing users to explore the dynamics of brain or brain-inspired models that combines insights from biology and machine learning.

    Particularly, brain dynamics models (provided in brainpy.dyn module) and deep learning models (provided in brainpy.dnn module) are closely integrated with each other in BrainPy. First, to build brain dynamics models, users should use the building blocks in brainpy.dnn module to create synaptic projections.

    Second, to build brain-inspired computing models for machine learning, users could also take advantages of neuronal and synaptic dynamics have been provided in brainpy.dyn module.

    To that end, BrainPy provides building blocks of detailed conductance-based models like Hodgkin-Huxley, as well as common deep learning layers like convolutions.

    Regarding the advantage of using BrainPy over PyTorch for purely deep networks, we acknowledge that existing deep learning libraries like Flax in the JAX ecosystem provide extensive tools and examples for constructing traditional deep neural networks. While BrainPy does implement standard deep learning layers, our primary focus is not to compete directly with those libraries. Instead, we provide these models for the seamless integration of deep learning layers within BrainPy's core modeling abstractions, including variables and dynamical systems. This integration allows researchers to incorporate common deep learning layers into their brain models. Additionally, the inclusion of deep learning layers in BrainPy serves as examples for customization and facilitates the development of tailored layers for neuroscience research. Researchers can modify or extend the implementations to suit their specific needs.

    In summary, BrainPy's scope focuses on the general-purpose brain dynamics programming. The target audience includes computational neuroscientists who want to incorporate insights from machine learning, as well as some ML researchers interested in integrating brain-like components.

    In terms of plasticity, only external training procedures are implemented (backpropagation, FORCE, surrogate gradients). No local plasticity mechanism (Hebbian learning for rate-coded networks, STDP and its variants for spiking networks) seems to be implemented, apart from STP. Is it a planned feature? Appendix 8 refers to bp.synplast.STDP(), but it is not present in the current code (https://github.com/brainpy/BrainPy/tree/master/brainpy/_src/dyn/synplast). Spiking networks without STDP are not going to be very useful to computational neuroscientists, so this suggests that the simulator targets primarily neuro-AI, i.e. AI researchers interested in using spiking models in a machine learning approach.

    We appreciate that the reviewer raising the limitations of BrainPy in terms of local plasticity mechanisms. We are sorry for the delay of implementing STDP models in BrainPy. Currently, we provide very general implementations of STDP. It can be compatible with any synaptic model (such as Exponential, Dual Exponential, AMPA, GABA, and NMDA dynamics), and common connection patterns (such as Dense, and Sparse connection patterns).

    bp.dyn.STDP_Song2001(pre, post, delay, syn, comm, out)

    It can also be easily used with the combination of short-term plasticity models. The modular design of BrainPy's framework also make the plasticity component straightforward to be implemented and integrated into existing models.

    A second weakness of the paper concerns the demos and benchmarks used to demonstrate the versatility and performance of BrainPy, which are not sufficiently described. In Fig. 4, it is for example not explained how the reservoirs are trained (only the readout weights, or also the recurrent ones? Using BPTT only makes sense when the recurrent weights are also trained.), nor how many neurons they have, what the final performance is, etc. The comparison with NEURON, NEST, and Brian2 is hard to trust without detailed explanations. Why are different numbers of neurons used for COBA and COBAHH? How long is the simulation in each setting? Which time is measured: the total time including compilation and network creation, or just the simulation time? Are the same numerical methods used for all simulators? It would also be interesting to discuss why the only result involving TPUs (Fig 8c) shows that it is worse than the V100 GPU. What could be the reason? Are there biologically-realistic networks that would benefit from a TPU? As the support for TPUs is a major selling point of BrainPy, it would be important to investigate its usage further.

    We appreciate the reviewer for raising the important question about the demos and benchmarks used to demonstrate the versatility and performance of BrainPy. To address these concerns, we have added more details in the revised paper, including:

    • In Fig. 4, we explain how the reservoirs are trained in Appendix 10, in which only the readout weights are trained, and they are trained using backpropagation, FORCE learning, and ridge regression algorithms, respectively. We also specify the number of neurons in each reservoir (see L1397), and the final performance of the reservoirs on the task (see Figure 4).

    • To enable readers to better interpret the simulator comparisons in Fig. 8, we have also added more detailed explanations of the comparison with NEURON, NEST, and Brian2 in Appendix 11.

    • In the current revised paper, we provide a comprehensive analysis of BrainPy's compatibility with different hardware platforms, including TPUs, and to identify the specific conditions under which TPUs may offer advantages (see Figure 8 and Appendix 11—figure 7 ). We have also discussed the potential benefits of TPUs for biologically-realistic networks (see L514 - L521). Particularly, for the biological network with arbitrary sparsity, TPUs does not show advantage over GPUs (see Appendix 11—figure 7). TPUs are best at exploiting certain kinds of structured sparsity, for example block sparsity.

  2. eLife assessment

    The paper introduces a new, important framework for neural modelling that promises to offer efficient simulation and analysis tools for a wide range of biologically-realistic neural networks. The paper's examples provide solid support for the ease of use and flexibility of the framework, but the comparison to existing solutions (in particular in terms of accuracy and performance) is incomplete. With a more careful evaluation of the tool's strengths and limitations, the work would be of interest to a wide range of computational neuroscientists and researchers working on biologically inspired machine learning applications.

  3. Reviewer #1 (Public Review):

    Chaoming Wang and coauthors present a new framework for modeling neurons and networks of neurons, spanning a wide range of possible models from detailed (point-neuron) models with non-linear ion channel dynamics to more abstract rate neuron models. Models are defined in an object-oriented style, familiar to users of machine-learning frameworks like PyTorch, and are efficiently executed via the just-in-time compilation framework JAX/XLA. The programming paradigm naturally supports a hierarchical style, where e.g. a network is composed of neurons that contains ion channels; each of these components can be reused in different contexts and be simulated/analyzed individually.

    Strengths:
    Brainpy's approach is an innovative application of state-of-the-art technology widely used in the machine learning community (auto-differentation, just-in-time compilation) to modeling in computational neuroscience and could provide a useful bridge between the two domains which overlap more and more. For researchers, describing, running, and optimizing their models in Python is very convenient. The use of Numba to write efficient operators for JAX/XLA is innovative and potentially very powerful.

    The modeling framework is very flexible, where most types of models commonly used in computational neuroscience can be readily expressed.

    The framework supports various integration algorithms for ODEs, SDEs, and FDEs, several additional convenience tools for model training, optimization, and analysis, as well as many pre-defined ion-channel, neuron, and synapse models. The wide range of included simulation and analysis tools and pre-defined models is impressive, and exceeds those offered by most competing software. The software comes with extensive documentation, tutorials, and examples, on par with that of existing simulators that have been around for much longer.

    Weaknesses:
    While the article clearly outlines the strengths of the chosen approach, it lacks an equally clear exposition of its limitations and a more thorough comparison to established approaches. Two examples of limitations that should be stated more clearly, in my opinion: models need to be small enough to fit on a single machine (in contrast to e.g. NEURON and NEST which support distributed computation via MPI), and only single-compartment models are supported; both limitations are mentioned in passing in the discussion, but would merit a more upfront mention. Regarding the comparison to other approaches/simulators:

    1. The study does not verify the accuracy of the presented framework. While its basic approach (time-step-based simulation, standard numerical integration algorithms) is sufficiently similar to other software to not expect major discrepancies, an explicit comparison would remove any doubt. Quantitative measures of accuracies are particularly important in the context of benchmarks (see below), since simulations can be made arbitrarily fast by sacrificing performance.
    2. Benchmarking against other software is obviously important, but also full of potential pitfalls. The current article does not state clearly whether the results are strictly comparable. In particular: are the benchmarks on the different simulators calculating results to the same accuracy (use of single or double precision, same integration algorithm, etc.)? Does each simulator use the fastest possible execution mode (e.g. number of threads/processes for NEST, C++ standalone mode in Brian2, etc.)? What is exactly measured (compilation time, network generation time, simulation execution time, ...) - these components will scale differently with network size and simulation duration, so summing them up makes the results difficult to interpret. Details are also missing for the comparison between the XLA operator customization in C++ vs. Python: was the C++ variant written by the authors or by someone else? Does the NUMBA→XLA mechanism also support GPUs/TPUs? This comparison also seems to be missing from the GitHub repository provided for reproducing the paper results.
    3. While the authors convincingly argue for the merits of their Python-based/object-oriented approach, in my opinion, they do not fully acknowledge the advantages of domain-specific languages (NMODL, NestML, equation syntax of ANNarchy and Brian2, ...). In particular, such languages aim at a strong decoupling of the mathematical model description from its implementation and other parts of the model. In contrast, models described with BrainPy's approach often need to refer to such details, e.g. be aware of differences between dense and sparse connectivity schemes, online, or batch mode, etc. It might also be worth mentioning descriptive approaches to synaptic connectivity as supported by other simulators (connection syntax in Brian2, Connection Set Algebra for NEST).
  4. Reviewer #2 (Public Review):

    This manuscript introduces an integrative framework for modelling and analysis in neuroscience called BrainPy. It describes the many tools and utilities for building a wide range of models with an accessible and extensible unified interface written in Python. Several illustrative examples are provided for common use cases, including how to extend the existing classes to incorporate new features, demonstrating its ease of use and adherence to Python's programming conventions for integrative modelling across multiple scales and paradigms. The provided benchmarks also demonstrate that despite the convenience of presenting a high-level interpreted language to the user, it provides orders of magnitude of computational speed-up relative to three popular alternative frameworks on the chosen simulations through the extensive use of several Just In Time compilers. Computational benchmarks are also provided to illustrate the speed-up gained from running the models on massively parallel processing hardware, including GPUs, suggesting leading computational performance across a wide range of use cases.

    While the results presented are impressive, publishing further details of the benchmarks in an appendix would be helpful for evaluating the claims and the overall conclusion would be more convincing if the performance benefits were demonstrated on a wider selection of test cases. Unsatisfyingly, the authors gave up on making a direct comparison to Brian running on GPUs with GeNN which would have been a fairer comparison than CPU-based simulations. The code for the chosen benchmarks is also likely to be highly optimised by the authors for running on BrainPy but less so for the other platforms - a fairer test would be to invite the authors of the other simulators to optimise the same models and re-evaluate the benchmarks. Furthermore, the manuscript reads like an advertisement for the platform with very little discussion of its limitations, weaknesses, or directions for further improvement. A more frank and balanced perspective would strengthen the manuscript and give the reader greater confidence in the platform.

    Since simulators wax and wane in popularity, it would be reassuring to see a roadmap for development with a proposed release cadence and a sustainable governance policy for the project. This would serve to both clearly indicate the areas of active development where community contributions would be most valuable and also to reassure potential users that the project is unlikely to be abandoned in the near future, ensuring that their time investment in learning to use the framework will not be wasted. Similarly, a complex set of dependencies, which need to be modified for BrainPy, will likely make the project hard to maintain and so a similar plan to those given for the CI pipeline and documentation generation for automation of these modifications would be a good addition. It is also important to periodically reflect on whether it still makes sense to combine all the disparate tools into one framework as the codebase grows and starts to strain under modifications required to maintain its unification.

    Finally, a live demonstration would be a very useful addition to the project. For example, a Jupyter notebook hosted on mybinder.org or similar, and a fully configured Docker image, would each enable potential users to quickly experiment with BrainPy without having to install a stack of dependencies and troubleshoot version conflicts with their pre-existing setup. This would greatly lower the barrier to adoption and help to convince a larger base of modellers of the potential merits of BrainPy, which could be major, both in terms of the computational speed-up and ease of development for a wide range of modelling paradigms.

  5. Reviewer #3 (Public Review):

    The paper presents the novel neuro-simulator BrainPy, which introduces several new concepts compared to existing simulators such as NEST, Brian, or GeNN: 1) a modular and Pythonic interface, which avoids having to use a fixed set of neural/synaptic models or using a textual equation-oriented interface; 2) a common platform for simulation, training, and analysis; 3) the use of just-in-time compilation using JAX/XLA, allowing to transparently access CPU, GPU, and TPU platforms. While none of these features is new per se (apart from TPU support, as far as I know), their combination provides an interesting new direction for the design of neuro-simulators.

    Overall, BrainPy is a nice and valuable addition to the already overwhelming list of neuro-simulators, which all have their own advantages and drawbacks and are diversely maintained. The main strengths of BrainPy are 1) its multi-scale modular interface and 2) the possibility for the user to transparently use various hardware platforms for the simulation. The paper succeeds in explaining those two aspects in a convincing manner. The paper is also very didactic in explaining the different strengths and weaknesses of the current simulators, as well as the benefits of JIT compilation.

    One potential issue is that the scope of the neuro-simulator is not very clearly explained and the target audience is not well defined: is BrainPy primarily intended for computational neuroscientists or for neuro-AI practitioners? The simulator offers very detailed neural models (HH, fractional order models), classical point-models (LIF, AdEx), rate-coded models (reservoirs), but also deep learning layers (Conv, MaxPool, BatchNorm, LSTM). Is there an advantage to using BrainPy rather than PyTorch for purely deep networks? Is it possible to build hybrid models combining rate-coded reservoirs or convnets with a network of HH neurons? Without such a hybrid approach, it is unclear why the deep learning layers are needed. In terms of plasticity, only external training procedures are implemented (backpropagation, FORCE, surrogate gradients). No local plasticity mechanism (Hebbian learning for rate-coded networks, STDP and its variants for spiking networks) seems to be implemented, apart from STP. Is it a planned feature? Appendix 8 refers to `bp.synplast.STDP()`, but it is not present in the current code (https://github.com/brainpy/BrainPy/tree/master/brainpy/_src/dyn/synplast). Spiking networks without STDP are not going to be very useful to computational neuroscientists, so this suggests that the simulator targets primarily neuro-AI, i.e. AI researchers interested in using spiking models in a machine learning approach. However, it is unclear why they would be interested in HH or Morris-Lecar models rather than simpler LIF neurons.

    A second weakness of the paper concerns the demos and benchmarks used to demonstrate the versatility and performance of BrainPy, which are not sufficiently described. In Fig. 4, it is for example not explained how the reservoirs are trained (only the readout weights, or also the recurrent ones? Using BPTT only makes sense when the recurrent weights are also trained.), nor how many neurons they have, what the final performance is, etc. The comparison with NEURON, NEST, and Brian2 is hard to trust without detailed explanations. Why are different numbers of neurons used for COBA and COBAHH? How long is the simulation in each setting? Which time is measured: the total time including compilation and network creation, or just the simulation time? Are the same numerical methods used for all simulators? It would also be interesting to discuss why the only result involving TPUs (Fig 8c) shows that it is worse than the V100 GPU. What could be the reason? Are there biologically-realistic networks that would benefit from a TPU? As the support for TPUs is a major selling point of BrainPy, it would be important to investigate its usage further.