Adaptive Retention and Eviction for Efficient Caching in AI-Driven Systems
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Caching is a fundamental optimization technique used to reduce response latency, minimize bandwidth consumption, and lower computational overhead. GPTCache, a popular caching framework for large language models, provides vector-based semantic caching but exhibits notable limitations: it lacks adaptive eviction strategies, cannot efficiently handle high query volumes, and often results in memory overflow (OOM) when scaling to real-world workloads. This paper introduces an AI-driven hybrid caching mechanism that overcomes these drawbacks by dynamically adapting to system behavior. Our approach integrates machine learning algorithms, temporal trends, and context- aware features such as recency, frequency, miss penalties, and temporal variation. By utilizing graph-based representations and continuous feedback loops, the system learns and evolves optimal cache policies in real-time. Simulation results demonstrate significant improvements in cache hit rates, response latency, and memory utilization, especially under dynamic and large-scale query environments. The proposed system thus provides a robust, scalable alternative to conventional and static caching strategies, including GPTCache.