Real-Time Quantized YOLO Object Detection on Serverless Cloud Functions:An Experimental and Analytical Study

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Serverless computing has emerged as an attractive execution model for event-driven applicationsdue to its automatic scalability, fine-grained billing, and minimal operational overhead . Theseproperties make serverless platforms appealing for machine learning inference workloads with variableor bursty demand. However, deploying real-time computer vision pipelines on serverless infrastructuresremains challenging due to CPU-only execution, memory constraints, and cold-start overheads [5, 6].This paper presents a systematic experimental and analytical study of quantized object detectiondeployed on serverless cloud functions. Using quantized YOLO-based models executed on AWSLambda with ONNX Runtime , we investigate latency–cost trade-offs under practical deploy-ment constraints. Experimental results show that INT8 quantization substantially reduces warm-startinference latency , while cold-start behavior remains the dominant bottleneck.

Article activity feed