Sparse Projection Attention: A Computationally Efficient Framework for Long Sequence Modeling

Mehdi Chrifi Alaoui
Nour-eddine Joudar
Mohamed Ettaouil

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The self-attention mechanism has revolutionized sequence modeling but suffers from quadratic computational complexity with respect to sequence length, limiting its applicability to long sequences. We propose Sparse Projection Attention (SPA), a novel attention variant that leverages learnable sparse projections to reduce the effective dimensionality of queries and keys while maintaining expressive power. Our method is grounded in the Johnson-Lindenstrauss lemma and provides theoretical guarantees on distance preservation. We introduce a comprehensive mathematical framework including error bounds, convergence analysis, and gradient dynamics. Experimental results on language modeling, machine translation, and long-range sequence classification demonstrate that SPA achieves up to 8 × computational speedup while maintaining competitive performance compared to standard attention and other sparse variants. The proposed approach offers an effective trade-off between computational efficiency and model expressivity for long-sequence tasks, making transformers more accessible for resource-constrained environments and real-time applications.

Version published to 10.20944/preprints202604.0343.v1
Apr 6, 2026

Efficient Optimization of Large Language Models via Parameter-Efficient Tuning and Adaptive Inference

This article has 1 author:
1. Yang Ji
This article has no evaluationsLatest version Apr 10, 2026
Adaptive-PEFT: Dynamic Rank Adjustment for Efficient and Enhanced Large Language Model Fine-Tuning

This article has 2 authors:
1. Tianrui Zhao
2. Linyu Wu
This article has no evaluationsLatest version Mar 16, 2026
LatentRecurrentDepthLM: An Open-Source Framework for Recurrent-Depth Language Models with Controllable Test-Time Compute

This article has 1 author:
1. Ahsan Umar
This article has no evaluationsLatest version Feb 24, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Efficient Optimization of Large Language Models via Parameter-Efficient Tuning and Adaptive Inference

Adaptive-PEFT: Dynamic Rank Adjustment for Efficient and Enhanced Large Language Model Fine-Tuning

LatentRecurrentDepthLM: An Open-Source Framework for Recurrent-Depth Language Models with Controllable Test-Time Compute