A Kolmogorov–Arnold Compute-in-Memory (KA-CIM) Hardware Accelerator with High Energy Efficiency and Flexibility

Chirag Sudarshan
Paul Manea
John Paul Strachan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Kolmogorov-Arnold Networks (KAN) are an emerging AI model designed for AI+Science applications, offering up to 100x fewer parameters than conventional Multilayer Perceptrons (MLPs). KAN relies on computationally expensive non-linear functions, unlike MLPs, which are dominated by matrix multiplication. This limits KAN's compatibility with energy-efficient hardware accelerators like Compute-in-Memory (CIM), restricting it to energy-inefficient general-purpose chips. To address this gap, we propose KA-CIM, a memory-centric design for energy-efficient computation of KAN inference. We leverage the Piece-Wise Linear (PWL) approximation of non-linear functions to convert complex computations into Multiply-Accumulate (MAC) operations and pre-storing segment parameters, an approach suitable for memory-centric design. A specialized CIM unit dynamically routes inputs to appropriate PWL segments, while a crossbar array retrieves segment slope and intercept to execute the MAC operations. The tile partitioning feature facilitates higher PWL segments for improved accuracy without significant energy penalties. This architectural design facilitates efficient and flexible computation of arbitrary non-linear functions of KANs. Beyond KAN inference, KA-CIM's capability extends to multi-variable equations and derivative computations. KA-CIM achieves energy-delay products 1073x lower than CPU for non-linear operations. When executing KAN, it achieves 77x lower energy-delay product than a 100 TOPS/W CIM accelerator executing MLP for the same task.

Version published to 10.21203/rs.3.rs-5804189/v1 on Research Square
Jan 17, 2025

ROM-SRAM Hybrid Compute-in-Memory for Edge AI: Circuits, Architectures and Challenges

This article has 7 authors:
1. Xirui Du
2. Tianyi Yu
3. Hengping Zhou
4. Ling-An Cheong
5. Teng Wan
6. Huazhong Yang
7. Xueqing Li
This article has no evaluationsLatest version Oct 22, 2025
General Principles for Implementing Branch Prediction in Fully Adiabatic, Reversible, and SuperScalar (FARS) Processors

This article has 2 authors:
1. Byron Gregg
2. Christof Teuscher
This article has no evaluationsLatest version Oct 31, 2025
Racing to Idle: Energy Efficiency of Matrix Multiplication on Heterogeneous CPU and GPU Architectures

This article has 1 author:
1. Mufakir Qamar Ansari
This article has no evaluationsLatest version Oct 21, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

ROM-SRAM Hybrid Compute-in-Memory for Edge AI: Circuits, Architectures and Challenges

General Principles for Implementing Branch Prediction in Fully Adiabatic, Reversible, and SuperScalar (FARS) Processors

Racing to Idle: Energy Efficiency of Matrix Multiplication on Heterogeneous CPU and GPU Architectures