Experience-guided, Mixed-precision Matrix Multiplication with Apache TVM for ARM processors

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deep learning (DL) generates new computational tasks that are different from those encountered in classical scientific applications. In particular, DL training and inference require general matrix multiplications (gemm) with matrix operands that are far from large and square as in other scientific fields. In addition, DL models gain arithmetic/storage complexity, and as a result, reduced precision via quantization is now mainstream for inferring DL models in edge devices. Automatic code generation addresses these new types of gemm by 1) improving portability between different hardware with only one base code; 2) supporting mixed and reduced precision; and 3) enabling autotuning methods that, given a base operation, perform a (costly) optimization search for the best schedule. In this paper, we rely on Apache TVM to generate an experience-guided gemm that provides performance competitive with the TVM Auto-scheduler, while reducing tuning time by a factor of 48×.

Article activity feed