Weighted Sliding Attention: Adaptive Gaussian Decay for Context-Sensitive Local Transformers

Joshua Daniel Curry

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Weighted Sliding Attention (WSA) is a lightweight attention mechanism that introduces a learnable Gaussian decay within a fixed-size window. Unlike traditional sliding window methods that apply uniform attention to all neighboring tokens, WSA learns to adjust its attention span dynamically through a single trainable parameter: sigma (σ). This allows the model to focus more narrowly in noisy contexts or broaden its view when patterns are stable and well-formed.We evaluate WSA on three synthetic benchmarks—gradient trends, symmetrical palindromes, and noisy distractor sequences—and observe that the learned σ parameter adapts meaningfully to the structure of each task. Our results demonstrate that WSA not only performs competitively but also exhibits interpretable behaviors, making it a promising alternative for resource-constrained or cognitively informed transformer models.This work explores how dynamic attention width, guided by learned trust in local context, can improve both robustness and transparency in modern attention-based architectures.

Version published to 10.31219/osf.io/g2zpy_v1 on OSF Preprints
Apr 23, 2025

Personalized Learning Recommendations Based on Feature Extraction and Attention Mechanisms

This article has 6 authors:
1. Hui Li
2. Shuai Wu
3. Shue Gu
4. Ronghui Wang
5. Yanyan Chen
6. Haining Li
This article has no evaluationsLatest version Jul 30, 2025
Zarvan: An Efficient Gated Architecture for Sequence Modeling with Linear Complexity

This article has 1 author:
1. Yasser Sajjadi
This article has no evaluationsLatest version Jul 30, 2025
What Makes Neural Networks Trainable? Invexity as a Structural Design Principle in AI

This article has 4 authors:
1. Samuel Pinilla
2. Ana Sanabria
3. Jia Bi
4. Karen Egiazarian
This article has no evaluationsLatest version Aug 4, 2025

Listed in

Abstract

Article activity feed

Related articles

Personalized Learning Recommendations Based on Feature Extraction and Attention Mechanisms

Zarvan: An Efficient Gated Architecture for Sequence Modeling with Linear Complexity

What Makes Neural Networks Trainable? Invexity as a Structural Design Principle in AI