Dynamic Kernel Selection for Real-Time ML Inference

Changqing Dong

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Real-time machine learning (ML) inference is increasingly deployed in latency-critical applications such as mobile computing, Internet-of-Things (IoT), and autonomous systems. However, existing inference frameworks typically rely on statically optimized kernels, which fail to adapt to dynamic workload variations and heterogeneous hardware conditions. This paper presents Dynamic Kernel Selection (DKS), a runtime framework that adaptively selects operator kernels during inference to optimize latency and energy efficiency. DKS integrates a lightweight profiler and decision engine that dynam- ically chooses among diverse kernel implementations, including precision variants (FP32, FP16, INT8), algorithmic strategies (GEMM, Winograd), and heterogeneous devices (CPU, GPU, edge accelerators). Experimental results on convolutional and transformer models (ResNet-50, MobileNetV3, BERT) show that DKS achieves up to 38.2% reduction in latency and 27.5% improvement in energy efficiency compared to state-of- the-art static baselines, while remaining within 5–7% of oracle performance. These findings highlight kernel-level adaptivity as a practical and scalable solution for efficient real-time ML inference across diverse platforms.

Version published to 10.20944/preprints202510.0674.v1
Oct 9, 2025

Adaptive Dataflow and Precision Optimization for Deep Learning on Configurable Hardware Architectures

This article has 3 authors:
1. Gulnaz Rati
2. Rafael Mendes
3. Aisha Noor
This article has no evaluationsLatest version Oct 8, 2025
Characterization of Machine Learning Compilers for LLM inference on NVIDIA GPUs

This article has 3 authors:
1. Alejandro Carmona
2. Gregorio Bernabé
3. José M. García
This article has no evaluationsLatest version Oct 10, 2025
Optimizing Cloud Resources By Anomaly Detection And Machine Learning For Smarter Power Consumption And Execution Time Predictions

This article has 2 authors:
1. G. Prabhu
2. P. S. Ambili
This article has no evaluationsLatest version Oct 13, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Adaptive Dataflow and Precision Optimization for Deep Learning on Configurable Hardware Architectures

Characterization of Machine Learning Compilers for LLM inference on NVIDIA GPUs

Optimizing Cloud Resources By Anomaly Detection And Machine Learning For Smarter Power Consumption And Execution Time Predictions