Zarvan: An Efficient Gated Architecture for Sequence Modeling with Linear Complexity

Yasser Sajjadi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The Transformer architecture has become the de-facto standard for sequence modeling tasks but is hampered by the quadratic complexity of its self-attention mechanism, , rendering it inefficient for long sequences. To address this limitation, we introduce Zarvan, a novel, gated architecture for sequence modeling with linear complexity, . Zarvan replaces the self-attention mechanism with a dual-context gating system. Motivated by the finding that a single global summary is insufficient for precise information retrieval, Zarvan computes two distinct context vectors in parallel: a Holistic Context to capture the overall gist of the sequence, and an Associative Context to focus on important, sparse information. These context vectors inform an intelligent gating mechanism that modulates the information flow for each token. We conduct a comprehensive set of experiments across diverse domains, including text classification (IMDb), information retrieval (MS MARCO), vision-as-sequence (MNIST), and challenging synthetic benchmarks such as the Selective Copy task, which Zarvan solves perfectly, demonstrating precise long-range memory. The results demonstrate that Zarvan achieves accuracy that is highly competitive with, and in some cases superior to, the standard Transformer, while exhibiting significantly better computational efficiency and scalability. The code and experimental setups are available at https://github.com/systbs/zarvan/.

Version published to 10.20944/preprints202507.2512.v1
Jul 30, 2025

Efficient AI Systems for Domain Adaptation: LLM-Guided Weighted Contrastive Learning with Reduced Computational Requirements

This article has 1 author:
1. Sujin Kang
This article has no evaluationsLatest version Aug 4, 2025
Reimagining Efficiency in Vision-Language Models Through Low-Precision Training Across Modalities and Architectures

This article has 6 authors:
1. Beverley Marion
2. Rafael Kim
3. Amina Chowdhury
4. Julian E. Navarro
5. Lihua Zhang
6. Omar Farouk
This article has no evaluationsLatest version Aug 3, 2025
TransMODAL: A Dual-Stream Transformer with Adaptive Co-Attention for Efficient Human Action Recognition

This article has 3 authors:
1. Majid Joudaki
2. Mehdi Imani
3. Hamid R. Arabnia
This article has no evaluationsLatest version Jul 29, 2025

Listed in

Abstract

Article activity feed

Related articles

Efficient AI Systems for Domain Adaptation: LLM-Guided Weighted Contrastive Learning with Reduced Computational Requirements

Reimagining Efficiency in Vision-Language Models Through Low-Precision Training Across Modalities and Architectures

TransMODAL: A Dual-Stream Transformer with Adaptive Co-Attention for Efficient Human Action Recognition