Spectral-Profile-Aware Low-Rank Compression for GPU Memory and Bandwidth Optimization

Jewon Moon

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

GPU workloads in high-performance computing (HPC) and machine-learning inference are often limited by memory capacity and memory bandwidth rather than floating-point throughput. Low-rank factorization is a common strategy for reducing storage and memory traffic, yet it can also fail—sometimes increasing the memory footprint—when the rank required at a target error tolerance is too large. This paper makes the success/failure boundary explicit at the level of the singular spectrum. Using the Eckart–Young–Mirsky optimality identity and a minimal memory-traffic model, we relate the required rank k(ε) (for relative Frobenius tolerance ε) to the tail class of the singular values. We derive closed-form scalings for canonical tails and obtain a practical, vendor-agnostic decision rule: estimate the spectral tail, predict k(ε), and compress only when the predicted representation is memorypositive. A fully reproducible benchmark (Python script + CSV outputs) and a case study at N = 4096 illustrate the main point: at ε = 0.1 an exponential spectrum requires k = 16 and yields ∼ 1.3 × 102 storage reduction, whereas a borderline heavy tail requires k ≈ 3748 and yields no reduction. We also show how the achievable reduction scales with N at fixed tolerance.

Version published to 10.21203/rs.3.rs-8852490/v1 on Research Square
Feb 24, 2026

I/O for LLM Inference: A Survey of Storage and Memory Bottlenecks

This article has 1 author:
1. Rajarshi Chowdhury
This article has no evaluationsLatest version Mar 19, 2026
Flexible MAC Design for Sparse-Aware Deep Learning Accelerator

This article has 3 authors:
1. Chun-Lung Hsu
2. You-Chuan Li
3. Chih-Wei Liu
This article has no evaluationsLatest version Feb 23, 2026
Genomic Data Classification via Universal Compression

This article has 6 authors:
1. Yasmine Omri
2. Naomi Sagan
3. Eugene Min
4. Heewoong Choi
5. Taesup Moon
6. Tsachy Weissman
This article has no evaluationsLatest version Mar 3, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

I/O for LLM Inference: A Survey of Storage and Memory Bottlenecks

Flexible MAC Design for Sparse-Aware Deep Learning Accelerator

Genomic Data Classification via Universal Compression