SAIA: A Seamless Slurm-Native Solution for HPC-Based Services
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recent developments indicate a shift toward web services that employ ever larger AI models, e.g., Large Language Models (LLMs), requiring powerful hardware for inference. High-Performance Computing (HPC) systems are commonly equipped with such hardware for the purpose of large scale computation tasks. However, HPC infrastructure is inherently unsuitable for hosting real-time web services due to network, security and scheduling constraints. While various efforts exist to integrate external scheduling solutions, these often require compromises in terms of security or usability for existing HPC users. In this paper, we present SAIA, a Slurm-native platform consisting of a scheduler and a proxy. The scheduler interacts with Slurm to ensure the availability and scalability of services, while the proxy provides external access, which is secured via confined SSH commands. We have demonstrated SAIA’s applicability by deploying a large-scale LLM web service that has served over 50,000 users.