DAA:Dynamic attention allocation improves large-scale model reasoning

Yong Yang
TingTing Yang
shaoshuai Gao
jiahong Ning
lingzheng Kong

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The Transformer architecture has signifi- cantly advanced natural language processing (NLP) and has been foundational in devel- oping large language models (LLMs) such as LLaMA and ChatGPT. Despite their supe- rior accuracy, LLMs present unique challenges in practical inference, concerning the compute and memory-intensive nature. Multi-Head At- tention is one of the key components of LLMs, which can account for over 50% of LLMs mem- ory and compute requirement.We observe that there is a high degree of redundancy among heads about which tokens they pay attention to in different sequences. Based on this find- ing, we propose Dynamic Head Attention Al- location (DAA). DAA combines two-stage at- tention heads with a high amount of correla- tion for self-attention within chunks and lay- ers, which combines both local and global at- tention, thus reducing both memory and com- pute.

Version published to 10.21203/rs.3.rs-5025111/v1 on Research Square
Oct 11, 2024

Structured Reasoning with Large Language Models

This article has 1 author:
1. Srihari Tanmay Karthik Tadala
This article has no evaluationsLatest version May 28, 2025
Multimodal and Distributed LLMs: Bridging Scalability and Cross-Modal Reasoning

This article has 4 authors:
1. Rajesh Kumar
2. Isabelle Laurent
3. David Müller
4. Klaus Elli
This article has no evaluationsLatest version May 15, 2025
RToT Prompt Enhancement: Unlocking the Key to New Potential of Fishery Large Models

This article has 4 authors:
1. Yao Song
2. Chunli Lv
3. Kun Zhu
4. Xiaobin Qiu
This article has no evaluationsLatest version May 13, 2025

Listed in

Abstract

Article activity feed

Related articles

Structured Reasoning with Large Language Models

Multimodal and Distributed LLMs: Bridging Scalability and Cross-Modal Reasoning

RToT Prompt Enhancement: Unlocking the Key to New Potential of Fishery Large Models