Claim Check + Semaphore-Based Queuing: A Fault-Tolerant Pattern for Distributed OCR/LLM Inference without Managed Infrastructure
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
(CCSQI) pattern: a formally specified, production-validated architecture for fault-tolerant GPU inference that replaces \$1,480--\$2,100/month of managed cloud infrastructure with five open-source components on commodity hardware. The pattern targets \emph{temporally decoupled} workloads — systems where producers and consumers operate in separate time windows and the SLA is completeness before a deadline, not real-time latency. For this workload class, reactive autoscaling solves a problem the architecture does not generate. and \texttt{asyncio.shield} produce deadlock under 30+ concurrent tasks with no exception and no log entry. The resolution follows from the actor model~\cite{hewitt1973}: liveness belongs to the broker, exclusion belongs to the semaphore. We additionally prove that a six-operation pipeline has exactly four structurally reachable failure modes, three automatically repairable and one preventable by operation ordering. The pattern is validated over 90 days, 4,000 documents, 24 classification types, with zero document loss and \$0 recurring infrastructure cost.