State Space Models as CPU-Native Neural Network Architectures

Gabriel Zo-Hasina Rasatavohary

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We present experimental evidence that State Space Models (SSMs) are structurally advantageous for neural network inference on CPU and ARM architectures. By porting BitMamba-2, a 1.58-bit quantized Mamba implementation, from x86 AVX2 to ARM NEON, we achieve 82.5 tokens/sec (255M parameters) and 29.6 tokens/sec (1B parameters) on an Apple M1 processor — the first published ARM benchmark for this model family. We experimentally validate the O(1) memory property of SSMs: generation speed remains perfectly constant across sequence lengths from 50 to 200+ tokens, in contrast to Transformer-based models whose memory grows linearly with context via the KV cache. At comparable model weight sizes (~600 MB), the SSM achieves throughput competitive with quantized Transformers (~30–40 tokens/sec) while offering constant memory footprint and 1.58-bit compression (vs. 4-bit for Transformers). These results support the thesis that mathematical reformulations — here, the combination of state space recurrence with ternary quantization — can make non-GPU inference structurally competitive rather than merely tolerable.

Version published to 10.31224/6680
Mar 24, 2026

Cross-Platform Inference of 1.58-bit State Space Models: ARM NEON vs x86 AVX-512 vs GPU CUD

This article has 1 author:
1. Gabriel Zo-Hasina Rasatavohary
This article has no evaluationsLatest version Mar 25, 2026
Convergent Architectures: Computational Vossels and Compute-in-Memory State Space Models as a Unified Framework for Edge AI

This article has 1 author:
1. Greg Passmore
This article has no evaluationsLatest version Apr 14, 2026
I/O for LLM Inference: A Survey of Storage and Memory Bottlenecks

This article has 1 author:
1. Rajarshi Chowdhury
This article has no evaluationsLatest version Mar 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Cross-Platform Inference of 1.58-bit State Space Models: ARM NEON vs x86 AVX-512 vs GPU CUD

Convergent Architectures: Computational Vossels and Compute-in-Memory State Space Models as a Unified Framework for Edge AI

I/O for LLM Inference: A Survey of Storage and Memory Bottlenecks