Computation-Proximate Architecture: An Actor-Based Pattern for Real-Time Calculations Over Billion-Row Datasets
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Organizations processing historical datasets with billions of rows face a fundamental architectural choice: move data to computation (the traditional query-centric approach) or move computation to data. This paper presents the \textbf{Computation-Proximate Pattern}, an architecture that combines actor-based in-memory computation with tiered caching and row-key value stores to achieve sub-15ms response times over billion-row datasets. The pattern eliminates unnecessary data movement by resolving 70\% of data lookups from a two-level cache hierarchy---50\% from in-process L1 caches ($<$1$\mu$s) and 20\% from distributed L2 caches ($<$500$\mu$s)---and serving the remaining 30\% through asynchronous parallel key-value lookups ($<$10ms per get). We formalize the pattern's components, analyze its theoretical properties including expected latency under the order statistics of parallel lookups, and validate it empirically on two production systems: a financial calculation engine processing 8 million records daily against 9 billion historical rows, and a telecommunications middleware handling 30,000 concurrent connections over 4 billion historical events. The median end-to-end latency is 3.2ms (p99: 12.1ms). Comparative analysis against batch-processing and direct-query approaches on the same workload demonstrates order-of-magnitude latency reduction at significantly lower infrastructure cost. The pattern is technology-agnostic and applicable to any domain requiring real-time calculations over massive historical datasets.