Design and Evaluation of a Heterogeneous DPU Architecture for Accelerating Post-Quantum Cryptography
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
To remain secure against the threat of quantum computers, post-quantum cryptographic (PQC) algorithms have been introduced. These algorithms typically demand complex computations upon very large numbers, therefore posing substantial computational and integration challenges for hardware and system designers. In this work, we design a heterogeneous architecture integrating a central processing unit ( CPU) and an embedded graphics processing unit (GPU), within a data-processing unit (DPU), enabling PQC acceleration without host involvement. We then evaluate this architecture on two National Institute of Standards and Technology (NIST) standards for PQC digital signatures – ML-KEM (Kyber) and ML-DSA (Dilithium) - on a DPU with an on-board GPU. Leveraging the DPU’s onboard ARM cores as well as its A30 GPU, we benchmarked the task of generating and verifying 1000 digital signatures at once, when performed by just the device’s CPU, versus a hybrid CPU-GPU configuration. Our results show that for batch sizes of 10 and above, the heterogeneous architecture significantly outperforms the homogenous, achieving a speedup of up to 84x. These results highlight the potential of DPUs to bridge cryptographic algorithm design and system engineering, enabling scalable, high-throughput PQC deployment in future secure data-center networks.