A Fully Hardware-Managed Scheduling Architecture for AI Accelerators
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The proliferation of AI applications across diverse domains has driven the evolution of AI accelerators toward higher performance and energy efficiency. This paper addresses the critical challenge of task scheduling in AI accelerators by introducing a fully hardware-managed scheduling system. Our approach leverages Operator Completion Status Registers (OCSRs) and a novel computation-scheduling instruction set to minimize software overhead and maximize execution parallelism. The co-designed hardware-software solution comprises: (1) a dedicated hardware scheduling unit with a complete instruction pipeline, (2) a compiler that maps operators to scheduling instructions while managing OCSR allocation, and (3) a lightweight runtime for efficient task dispatch. Experimental results demonstrate that our system significantly reduces scheduling latency and improves overall throughput, achieving an average performance gain of approximately 30\% across multiple CNN models while maintaining minimal area overhead of only 7.41\%. The proposed architecture establishes a new paradigm for high-efficiency AI accelerator design.