First Fully Pipelined High Throughput FPGA Implementation and GPU Optimization of Wider Variant of AES

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In response to the recent NIST call for a wider variant of the AES algorithm, we developed a fully pipelined, high-throughput FPGA implementation of the 256-bit block size AES, referred to as WAES-256. This design targets both 7th generation and UltraScale+ FPGAs, focusing on maximizing throughput and efficient hardware utilization. Our work supports AES-128, AES-256, and WAES-256, employing composite field arithmetic in the S-box to reduce critical path delay. All AES layers are fully pipelined, enabling multiple levels of parallelism with minimal architectural changes. Our AES-128 implementations achieved the best throughput-per-slice (TPS) ratios reported in the literature for fair comparisons on the same FPGA platforms. For WAES-256, our designs reached 75.73 Gbps on Spartan-7, 72.32 Gbps on Artix-7, 199.46 Gbps on Zynq UltraScale+, and 206.11 Gbps on Kintex UltraScale+. Additionally, our multi-core parallel WAES-256 designs achieved 426.66 Gbps with x2 cores and 742.63 Gbps with x4 cores on the Kintex UltraScale+ platform, demonstrating the scalability of our approach. These results highlight the efficiency and scalability of our architectures, offering high-throughput performance without relying on BRAM, making them well-suited for next-generation cryptographic applications. Moreover, we optimized WAES-256 on GPUs and achieved performance comparable to the best AES-256 results. For instance, we achieved 3053.5 Gbps WAES-256 encryption in counter mode of operation on an RTX 4090. Our results show that using FPGAs or GPUs as co-processors for WAES-256 render encryption free and transition from AES-256 to WAES-256 results in no observable slowdowns.

Article activity feed