StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We address LLM serving workloads where repeated requests share a common solution structure but differ in localized constraints such as output schema, variable names, or numeric constants. Prior caching approaches typically reuse either full responses (semantic caching) or model-internal KV/prefix states, which are brittle under partial changes or tightly coupled to specific backends. We present StepCache , a backend-agnostic step-level reuse layer that segments outputs into ordered steps, retrieves the best matching cached request, verifies steps using lightweight task-aware checks, and regenerates only failing regions through selective patching. StepCache also supports strict structured-output enforcement for JSON through single-step extraction, required-key constraints, and one-shot repair, and includes conservative skip-reuse fallbacks for semantic changes. For linear equations, StepCache promotes verification into correction via a bounded repair loop with a deterministic fallback that guarantees correctness when the backend model fails. In a CPU-only perturbation-heavy micro-benchmark on math and JSON variants (10 base prompts per task with 3 variants per perturbation, totaling 222 evaluation requests per seed), averaged over three seeds (42–44), StepCache reduces mean latency from 2.13 seconds to 0.67 seconds (median: 2.42 seconds to 0.01 seconds; p95: 3.38 seconds to 3.30 seconds), reduces total token usage from 36.1k to 27.3k (162.7 to 123 tokens per request), and improves end-to-end correctness from 72.5% to 100% under both task-specific checks and a stitched-output integrity check. Across requests, 79.7% take the reuse-only fast path, 5.4% require patching, and 14.9% trigger skip-reuse.

Article activity feed