Spectral-Profile-Aware Low-Rank Compression for GPU Memory and Bandwidth Optimization

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

GPU workloads in high-performance computing (HPC) and machine-learning inference are often limited by memory capacity and memory bandwidth rather than floating-point throughput. Low-rank factorization is a common strategy for reducing storage and memory traffic, yet it can also fail—sometimes increasing the memory footprint—when the rank required at a target error tolerance is too large. This paper makes the success/failure boundary explicit at the level of the singular spectrum. Using the Eckart–Young–Mirsky optimality identity and a minimal memory-traffic model, we relate the required rank k(ε) (for relative Frobenius tolerance ε) to the tail class of the singular values. We derive closed-form scalings for canonical tails and obtain a practical, vendor-agnostic decision rule: estimate the spectral tail, predict k(ε), and compress only when the predicted representation is memorypositive. A fully reproducible benchmark (Python script + CSV outputs) and a case study at N = 4096 illustrate the main point: at ε = 0.1 an exponential spectrum requires k = 16 and yields ∼ 1.3 × 102 storage reduction, whereas a borderline heavy tail requires k ≈ 3748 and yields no reduction. We also show how the achievable reduction scales with N at fixed tolerance.

Article activity feed