Experience-guided, Mixed-precision Matrix Multiplication with Apache TVM for ARM processors

Adrián Castelló
Héctor Martínez
Sandra Catalán
Francisco D. Igual
Enrique S. Quintana-Ortí

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Deep learning (DL) generates new computational tasks that are different from those encountered in classical scientific applications. In particular, DL training and inference require general matrix multiplications (gemm) with matrix operands that are far from large and square as in other scientific fields. In addition, DL models gain arithmetic/storage complexity, and as a result, reduced precision via quantization is now mainstream for inferring DL models in edge devices. Automatic code generation addresses these new types of gemm by 1) improving portability between different hardware with only one base code; 2) supporting mixed and reduced precision; and 3) enabling autotuning methods that, given a base operation, perform a (costly) optimization search for the best schedule. In this paper, we rely on Apache TVM to generate an experience-guided gemm that provides performance competitive with the TVM Auto-scheduler, while reducing tuning time by a factor of 48×.

Version published to 10.21203/rs.3.rs-5017241/v1 on Research Square
Oct 8, 2024

Accelerating and Compressing Transformer-Based PLMs for Enhanced Comprehension of Computer Terminology

This article has 2 authors:
1. Jian Peng
2. Kai Zhong
This article has no evaluationsLatest version Oct 22, 2024
Efficient Configuration of Heterogeneous Resources and Task Scheduling Strategies in Deep Learning Auto-Tuning Systems

This article has 2 authors:
1. Pao-Yi Ken
2. Chao-Chin Wu
This article has no evaluationsLatest version Oct 17, 2024
Efficient Model-based Deep Learning via Network Pruning and Fine-Tuning

This article has 6 authors:
1. Chicago Y. Park
2. Weijie Gan
3. Zihao Zou
4. Yuyang Hu
5. Zhixin Sun
6. Ulugbek S. Kamilov
This article has no evaluationsLatest version Oct 29, 2024

Listed in

Abstract

Article activity feed

Related articles

Accelerating and Compressing Transformer-Based PLMs for Enhanced Comprehension of Computer Terminology

Efficient Configuration of Heterogeneous Resources and Task Scheduling Strategies in Deep Learning Auto-Tuning Systems

Efficient Model-based Deep Learning via Network Pruning and Fine-Tuning