Fast and accurate algorithms for matrix multiplication using fused multiply-add and their rounding error analysis

Katsuhisa Ozaki
Toru Koizumi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We propose a numerical algorithm for accurate matrix multiplication. When standard floating-point arithmetic fails to deliver sufficient accuracy, high-precision techniques such as pair arithmetic and double-word arithmetic are typically employed, although at the expense of considerable computational cost. The proposed algorithms attain high accuracy by efficiently exploiting Fused Multiply-Add (FMA) operations, which are less computationally expensive than conventional high-precision methods. Numerical experiments confirm the effectiveness of the proposed algorithms. In particular, the proposed algorithms are especially effective in environments where dedicated floating-point adder units are available. Although its accuracy is slightly inferior to that of pair arithmetic or double-word arithmetic, numerical experiments have demonstrated that it achieves about two to three times higher speed. Comparative evaluations with accurate GEMM-based algorithms are conducted on both CPUs and GPUs, and rounding error analyses of the proposed algorithms are also provided.

Version published to 10.21203/rs.3.rs-8242254/v1 on Research Square
Dec 17, 2025

A Review of Floating-Point Arithmetic Algorithms Using Taylor Series Expansion and Mantissa Region Division Techniques

This article has 2 authors:
1. Jianglin Wei
2. Haruo Kobayashi
This article has no evaluationsLatest version Jan 5, 2026
Miller-Stable s-Step Conjugate Gradient and Conjugate Residual Methods

This article has 1 author:
1. Stephen Thomas
This article has no evaluationsLatest version Feb 4, 2026
GPU-NTT and Karatsuba Co-Optimization forHigh-Throughput Polynomial MultiplicationAcceleration

This article has 4 authors:
1. Ruwei Huang
2. xiaolong Tang
3. Junjie Wang
4. Xuezheng Qin
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Review of Floating-Point Arithmetic Algorithms Using Taylor Series Expansion and Mantissa Region Division Techniques

Miller-Stable s-Step Conjugate Gradient and Conjugate Residual Methods

GPU-NTT and Karatsuba Co-Optimization forHigh-Throughput Polynomial MultiplicationAcceleration