Utilizing the Attention Mechanism for Accuracy Prediction in Quantized Neural Networks

Lu Wei
Zhong Ma
ChaoJie Yang
Qin Yao
Wei Zheng

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Quantization plays a crucial role in deploying neural network models on resource-limited hardware. However, current quantization methods have issues like large accuracy loss and poor generalization for complex tasks. These issues pose obstacles to the practical application of deep learning and large language models in smart systems. The main problem is our limited understanding of quantization's effect on accuracy, and there is also a need for more effective approaches to evaluate the performance of the quantized models. To address these concerns, we develop a novel method that leverages the self-attention mechanism. This method predicts a quantized model's accuracy using a single representative image from the test set. It utilizes the Transformer encoder and decoder to perform this prediction. The prediction error of the quantization accuracy on three types of neural network models is only 2.35%. The proposed method enables rapid performance assessment of the quantized models during the development stage, thereby facilitating the optimization of quantization parameters and promoting the practical application of neural network models.

Version published to 10.20944/preprints202501.2272.v1
Jan 30, 2025

Less is More: Quantization of Deep Neural Networks

This article has 1 author:
1. Brady Steele
This article has no evaluationsLatest version Jan 7, 2025
Quantization of a Llama Language Model for improved Efficiency and Inference

This article has 4 authors:
1. S Madhanegha
2. V Vishnuvaradhan
3. R Arun
4. I Surenther
This article has no evaluationsLatest version Feb 17, 2025
BOANN: Bayesian-Optimized Attentive Neural Network for Classification

This article has 6 authors:
1. Luoyao He
2. Xingqi Wang
3. Yuzhen Lin
4. Xinjin Li
5. Yu Ma
6. Zhenglin Li
This article has no evaluationsLatest version Feb 4, 2025

Listed in

Abstract

Article activity feed

Related articles

Less is More: Quantization of Deep Neural Networks

Quantization of a Llama Language Model for improved Efficiency and Inference

BOANN: Bayesian-Optimized Attentive Neural Network for Classification