Performance Enhancement and Energy Consumption Improvement of Convolutional Neural Networks through Architecture-aware Code Optimization
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Three new architecture-aware code optimization techniques are proposed here to address the issue at hand and improve the efficiency of execution in modern processors. The focus is on reducing executed instructions and memory accesses of the convolutional layers to obtain the opportunistically exploiting data access locality. The advanced post-compiler optimization technique unrolls the innermost loop in a manner that significantly reduces the count of loop body instructions and memory accesses. It is revealed that, next to differences in memory access patterns that affect the cache miss ratio, there exist different permutations in the count of executed instructions and memory requests. Attempt is made to maximize the reuse of processor registers, beyond compiler optimizations, to reduce the number of memory reference instructions. The gem5 full-system simulator to yield 1.6x performance improvment and a 62% reduction in energy. These enhancements are achieved by a 48.3% reduction in the count of executed instructions and a 80% reduction in the D-cache miss rate, respectively.