An Open Chisel-Based Framework for Hardware Acceleration on High-Performance FPGA Cards

Robin Gay
Tarek Ould-Bachir

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper presents an open and fully Chisel-based hardware acceleration framework tailored for high-performance FPGA platforms, with a specific focus on AMD/Xilinx Alveo UltraScale+ cards. While the high-level synthesis (HLS) flow offered by Xilinx enables rapid deployment and is well-suited for many applications, it can be overly abstract for low-level control scenarios such as ASIC prototyping. The alternative RTL Kernel flow offers finer control but often suffers from the limitations of legacy hardware description languages and the overhead of vendor-specific tooling. To address these limitations, we propose a fully open-source workflow based on Chisel, a modern hardware construction language embedded in Scala. Chisel combines the flexibility of object-oriented programming with the ability to generate synthesizable RTL, enabling scalable, reusable, and modular designs. Our framework demonstrates how Chisel can be used to implement advanced hardware features including AXI4/AXI4-Lite interfacing, multi-clock domain designs, asynchronous communication primitives, and enhanced simulation capabilities such as custom VCD trace generation. The use of the Vivado RTL flow bypasses the constraints imposed by the Xilinx golden image and XRT stack, allowing direct programming and fine-grained control over the FPGA fabric. Lightweight host communication is achieved via the XDMA IP and Linux device files, enabling platform-agnostic integration using standard programming languages such as C++ and Python. As a proof of concept, we implement a high-throughput matrix-vector multiplication engine for floating-point data in a self-alignment format (SAF), fully utilizing the resources of a multi-SLR Alveo U200 card. Benchmark results show efficient pipelined operation and full cross-SLR scalability, validating the viability of the proposed framework for custom acceleration pipelines.

Version published to 10.20944/preprints202508.0984.v1
Aug 13, 2025

First Fully Pipelined High Throughput FPGA Implementation and GPU Optimization of Wider Variant of AES

This article has 2 authors:
1. Ahmet MALAL
2. Cihangir TEZCAN
This article has no evaluationsLatest version Jul 16, 2025
Reconfigurable Acceleration of Deep Learning Workloads with FPGA-Based Architectures in Edge and Embedded Systems

This article has 5 authors:
1. Lucas Oliveira
2. Camila Ferreira
3. Thiago Souza
4. Gulnaz Rati
5. Mariana Costa
This article has no evaluationsLatest version Jul 15, 2025
High-Performance FPGA Acceleration for Transformer-Based Models

This article has 3 authors:
1. Gulnaz Rati
2. Rafael Costa
3. Lena Ishikawa
This article has no evaluationsLatest version Jul 15, 2025

Listed in

Abstract

Article activity feed

Related articles

First Fully Pipelined High Throughput FPGA Implementation and GPU Optimization of Wider Variant of AES

Reconfigurable Acceleration of Deep Learning Workloads with FPGA-Based Architectures in Edge and Embedded Systems

High-Performance FPGA Acceleration for Transformer-Based Models