Translution: Unifying Self-Attention and Convolution for Adaptive and Relative Modelling

Hehe Fan
Yi Yang
Fei Wu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

When modeling a given type of data, we consider it to involve two key aspects: 1) identifying relevant elements (_e.g_., image pixels or textual words) to a central element, as in a convolutional receptive field, or to a query element, as in self-attention, and (2) encoding these tokens effectively. Self-attention can adaptively identify these elements but relies on absolute positional embedding for structural representation learning. In contrast, convolution encodes elements in a relative manner, yet their fixed kernel size limits their ability to adaptively select the relevant elements. In this paper, we introduce Translution, an operation that unifies the adaptive identification capability of self-attention and the relative encoding advantage of convolution. However, this integration leads to a substantial increase in the number of parameters, exceeding most currently available computational resources. Therefore, we propose a lightweight variant of Translution, named \(\alpha\)-Translution. Experiments on computer vision and natural language processing tasks show that Translution (including \(\alpha\)-Translution) achieves superior accuracy compared to self-attention, demonstrating its potential to build the next generation of deep neural networks. The code is available at https://github.com/hehefan/Translution.

Version published to 10.32388/q6kc0c.2
Oct 3, 2025
Version published to 10.32388/q6kc0c
May 10, 2025

Scene Text Recognition via Alternating Hierarchical-Global Attention in Encoder-Only Transformers

This article has 3 authors:
1. Shashank B N
2. S. Nagesh Bhattu
3. Sri Phani Krishna K
This article has no evaluationsLatest version Oct 1, 2025
Contextual Synergy through Explicit and Implicit Relations: A Unified Perspective for Image Description Generation

This article has 4 authors:
1. Lotte Vermeulen
2. Yara Van den Broeck
3. Callum Hensley
4. Bram Smet
This article has no evaluationsLatest version Oct 8, 2025
HAST: A New Style Transfer Network Integrating Convolution and Attention Mechanism

This article has 4 authors:
1. Kunyun Wu
2. Yang Xu
3. Bin Cao
4. Caideng Zhang
This article has no evaluationsLatest version Sep 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Scene Text Recognition via Alternating Hierarchical-Global Attention in Encoder-Only Transformers

Contextual Synergy through Explicit and Implicit Relations: A Unified Perspective for Image Description Generation

HAST: A New Style Transfer Network Integrating Convolution and Attention Mechanism