Dynamic Kernel Selection for Real-Time ML Inference

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Real-time machine learning (ML) inference is increasingly deployed in latency-critical applications such as mobile computing, Internet-of-Things (IoT), and autonomous systems. However, existing inference frameworks typically rely on statically optimized kernels, which fail to adapt to dynamic workload variations and heterogeneous hardware conditions. This paper presents Dynamic Kernel Selection (DKS), a runtime framework that adaptively selects operator kernels during inference to optimize latency and energy efficiency. DKS integrates a lightweight profiler and decision engine that dynam- ically chooses among diverse kernel implementations, including precision variants (FP32, FP16, INT8), algorithmic strategies (GEMM, Winograd), and heterogeneous devices (CPU, GPU, edge accelerators). Experimental results on convolutional and transformer models (ResNet-50, MobileNetV3, BERT) show that DKS achieves up to 38.2% reduction in latency and 27.5% improvement in energy efficiency compared to state-of- the-art static baselines, while remaining within 5–7% of oracle performance. These findings highlight kernel-level adaptivity as a practical and scalable solution for efficient real-time ML inference across diverse platforms.

Article activity feed