Claim Check + Semaphore-Based Queuing: A Fault-Tolerant Pattern for Distributed OCR/LLM Inference without Managed Infrastructure

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

(CCSQI) pattern: a formally specified, production-validated architecture for fault-tolerant GPU inference that replaces \$1,480--\$2,100/month of managed cloud infrastructure with five open-source components on commodity hardware. The pattern targets \emph{temporally decoupled} workloads — systems where producers and consumers operate in separate time windows and the SLA is completeness before a deadline, not real-time latency. For this workload class, reactive autoscaling solves a problem the architecture does not generate. and \texttt{asyncio.shield} produce deadlock under 30+ concurrent tasks with no exception and no log entry. The resolution follows from the actor model~\cite{hewitt1973}: liveness belongs to the broker, exclusion belongs to the semaphore. We additionally prove that a six-operation pipeline has exactly four structurally reachable failure modes, three automatically repairable and one preventable by operation ordering. The pattern is validated over 90 days, 4,000 documents, 24 classification types, with zero document loss and \$0 recurring infrastructure cost.

Article activity feed