Reducing Latency in Large-Scale Data Systems Through Intelligent Memory Tiering and Offloading Mechanisms

Siri Yellu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The exponential growth of data volumes, driven by IoT, AI, and real-time analytics, has placed unprecedented demands on large-scale data systems. A critical bottleneck in these systems is memory access latency, which directly impacts application performance and user experience. Traditional homogenous memory architectures, primarily reliant on Dynamic Random-Access Memory (DRAM), are increasingly insufficient due to cost, power, and density constraints. This paper proposes a novel framework for intelligent memory tiering and offloading that leverages a heterogeneous memory hierarchy, integrating DRAM with emerging non-volatile memory (NVM) technologies and leveraging fast storage. Our approach utilizes a lightweight machine learning-based profiler to dynamically classify data access patterns (hot, warm, cold) and an intelligent data placement engine that migrates data between memory tiers (DRAM, NVM, NVMe SSD) to minimize access latency. We also introduce a proactive offloading mechanism that pre-emptively moves data likely to be accessed by batch or analytical workloads to a high-throughput storage layer, reducing contention on the primary memory bus. A simulation-based evaluation demonstrates that our proposed framework can reduce overall tail latency by up to 45% and improve system throughput by 30% compared to traditional LRU-based caching and uniform memory architectures, while maintaining a 20% reduction in total cost of ownership (TCO). This research provides a viable pathway for designing next-generation data systems capable of meeting the low-latency requirements of modern data-intensive applications.

Version published to 10.21203/rs.3.rs-8210374/v1 on Research Square
Nov 28, 2025

FlashServe: Cost-Efficient Serverless Inference Scheduling for Large Language Models via Tiered Memory Management and Predictive Autoscaling

This article has 1 author:
1. Bolin Chen
This article has no evaluationsLatest version Dec 22, 2025
StrataServe: Hierarchical HBM–DRAM–SSD Parameter Serving for Distributed AI

This article has 1 author:
1. Yaswanth Sai Kamma
This article has no evaluationsLatest version Jan 15, 2026
A Hardware-Assisted Log-Structured File System with Temporal Locality Optimization under Resource Constraints

This article has 2 authors:
1. Yosuke Sugisawa
2. Daisuke Sugisawa
This article has no evaluationsLatest version Jan 20, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

FlashServe: Cost-Efficient Serverless Inference Scheduling for Large Language Models via Tiered Memory Management and Predictive Autoscaling

StrataServe: Hierarchical HBM–DRAM–SSD Parameter Serving for Distributed AI

A Hardware-Assisted Log-Structured File System with Temporal Locality Optimization under Resource Constraints