Weighted Sliding Attention: Adaptive Gaussian Decay for Context-Sensitive Local Transformers

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Weighted Sliding Attention (WSA) is a lightweight attention mechanism that introduces a learnable Gaussian decay within a fixed-size window. Unlike traditional sliding window methods that apply uniform attention to all neighboring tokens, WSA learns to adjust its attention span dynamically through a single trainable parameter: sigma (σ). This allows the model to focus more narrowly in noisy contexts or broaden its view when patterns are stable and well-formed.We evaluate WSA on three synthetic benchmarks—gradient trends, symmetrical palindromes, and noisy distractor sequences—and observe that the learned σ parameter adapts meaningfully to the structure of each task. Our results demonstrate that WSA not only performs competitively but also exhibits interpretable behaviors, making it a promising alternative for resource-constrained or cognitively informed transformer models.This work explores how dynamic attention width, guided by learned trust in local context, can improve both robustness and transparency in modern attention-based architectures.

Article activity feed