Adaptive Retention and Eviction for Efficient Caching in AI-Driven Systems

Spoorthi Ravi Pratap
Spandana M Raikar
Sounjna Vighneshwar Bhat

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Caching is a fundamental optimization technique used to reduce response latency, minimize bandwidth consumption, and lower computational overhead. GPTCache, a popular caching framework for large language models, provides vector-based semantic caching but exhibits notable limitations: it lacks adaptive eviction strategies, cannot efficiently handle high query volumes, and often results in memory overflow (OOM) when scaling to real-world workloads. This paper introduces an AI-driven hybrid caching mechanism that overcomes these drawbacks by dynamically adapting to system behavior. Our approach integrates machine learning algorithms, temporal trends, and context- aware features such as recency, frequency, miss penalties, and temporal variation. By utilizing graph-based representations and continuous feedback loops, the system learns and evolves optimal cache policies in real-time. Simulation results demonstrate significant improvements in cache hit rates, response latency, and memory utilization, especially under dynamic and large-scale query environments. The proposed system thus provides a robust, scalable alternative to conventional and static caching strategies, including GPTCache.

Version published to 10.21203/rs.3.rs-6897063/v1 on Research Square
Jul 11, 2025

Workload Prediction for Proactive Resource Allocation in Large-Scale Cloud-Edge Applications

This article has 3 authors:
1. Thang Duc Le
2. Chanh Nguyen
3. Östberg P-O
This article has no evaluationsLatest version Jul 4, 2025
High-Performance Vector Database

This article has 2 authors:
1. Abiodun Oketunji
2. Kyriakos Gkikas
This article has no evaluationsLatest version Jul 30, 2025
Scalable Distributed Architectures for Real-Time Data Processing: A Novel Approach to Adaptive Analytical Querying

This article has 1 author:
1. Sowmith Reddy Thukkani
This article has no evaluationsLatest version Jul 14, 2025

Listed in

Abstract

Article activity feed

Related articles

Workload Prediction for Proactive Resource Allocation in Large-Scale Cloud-Edge Applications

High-Performance Vector Database

Scalable Distributed Architectures for Real-Time Data Processing: A Novel Approach to Adaptive Analytical Querying