Adaptive Retention and Eviction for Efficient Caching in AI-Driven Systems

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Caching is a fundamental optimization technique used to reduce response latency, minimize bandwidth consumption, and lower computational overhead. GPTCache, a popular caching framework for large language models, provides vector-based semantic caching but exhibits notable limitations: it lacks adaptive eviction strategies, cannot efficiently handle high query volumes, and often results in memory overflow (OOM) when scaling to real-world workloads. This paper introduces an AI-driven hybrid caching mechanism that overcomes these drawbacks by dynamically adapting to system behavior. Our approach integrates machine learning algorithms, temporal trends, and context- aware features such as recency, frequency, miss penalties, and temporal variation. By utilizing graph-based representations and continuous feedback loops, the system learns and evolves optimal cache policies in real-time. Simulation results demonstrate significant improvements in cache hit rates, response latency, and memory utilization, especially under dynamic and large-scale query environments. The proposed system thus provides a robust, scalable alternative to conventional and static caching strategies, including GPTCache.

Article activity feed