Utilizing the Attention Mechanism for Accuracy Prediction in Quantized Neural Networks
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Quantization plays a crucial role in deploying neural network models on resource-limited hardware. However, current quantization methods have issues like large accuracy loss and poor generalization for complex tasks. These issues pose obstacles to the practical application of deep learning and large language models in smart systems. The main problem is our limited understanding of quantization's effect on accuracy, and there is also a need for more effective approaches to evaluate the performance of the quantized models. To address these concerns, we develop a novel method that leverages the self-attention mechanism. This method predicts a quantized model's accuracy using a single representative image from the test set. It utilizes the Transformer encoder and decoder to perform this prediction. The prediction error of the quantization accuracy on three types of neural network models is only 2.35%. The proposed method enables rapid performance assessment of the quantized models during the development stage, thereby facilitating the optimization of quantization parameters and promoting the practical application of neural network models.