Reproducible High Performance Computing without Redundancy with Nix
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
High performance computing (HPC) clusters are typically managed in arestrictive manner; the large user base makes cluster administrators unwillingto allow privilege escalation. Here we discuss existing methods of packagemanagement, including those which have been developed with scalability in mind,and enumerate the drawbacks and advantages of each management methodology. Wecontrast the paradigms of containerization via docker, virtualization via KVM,pod-infrastructures via Kubernetes, and specialized HPC packaging systems viaSpack and identify key areas of neglect. We demonstrate how functionalprogramming due to reliance on immutable states has been leveraged fordeterministic package management via the nix-language expressions. We show itsassociated ecosystem is a prime candidate for HPC package management. We furtherdevelop guidelines and identify bottlenecks in the existing structure andpresent the methodology by which the nix ecosystem should be developed furtheras an optimal tool for HPC package management. We assert that the caveats of thenix ecosystem can easily mitigated by considerations relevant only to HPCsystems, without compromising on functional methodology and features of thenix-language. We show that benefits of adoption in terms of generatingreproducible derivations in a secure manner allow for workflows to be scaled acrossheterogeneous clusters. In particular, from the implementation hurdles facedduring the compilation and running of the d-SEAMS scientific software engine,distributed as a nix-derivation on an HPC cluster, we identify communicationprotocols for working with SLURM and TORQUE user resource allocation queues.These protocols are heuristically defined and described in terms of thereference implementation required for queue-efficient nix builds.