Removing Temporal Dependencies in Spiking Neural Networks: A Feasibility Study on State Space Reformulation for Parallel-Trainable SNN Language Modeling

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Spiking Neural Networks (SNNs) offer potential energy efficiency through event-driven computation, yet their inherent temporal dependencies—analogous to those in RNNs—have prevented them from being trained from scratch at the parameter regimes occupied by modern language models. We propose SNN-SSM, a framework that reformulates Leaky Integrate-and-Fire (LIF) neuron dynamics as a State Space Model (SSM), enabling parallel training via closed-form FFT convolution while preserving sparse spiking activations at inference time. To bridge the gap between the linear convolution used during training and the non-linear reset used during recurrent inference, we introduce a reset annealing schedule that gradually exposes the model to reset dynamics over the course of training. A four-way ablation of the reset annealing coefficient reveals an accuracy–gap trade-off: in controlled experiments on WikiText-2 at 11.1M parameters using a single NVIDIA Tesla T4 GPU, the accuracy-optimized configuration reaches 75.8 best validation perplexity with a 2.2-point train–inference gap, while the gap-optimized default configuration reaches 78.9 perplexity with the train–inference gap reduced to 0.46. Both configurations dramatically outperform a matched-size ANN baseline (152.1 PPL) and trail a matched-size Mamba reference (72.0 PPL) by 4–7 points. We interpret the gap to Mamba as the quantitative cost of committing to binary spike activations; under standard CMOS cost models (Horowitz, 2014), the resulting sparse activation pattern corresponds to an estimated ≈94% energy reduction on idealized neuromorphic hardware. Our contributions are (i) a demonstration that temporal dependencies can be removed from SNNs without collapsing sparsity, (ii) the reset annealing mechanism that makes the train-as-SSM / infer-as-SNN paradigm practically viable, and (iii) a staged empirical study including a faithful pure-PyTorch Mamba reference implementation and a four-way reset annealing ablation.

Article activity feed