Structured Representation Compression for Large Language Models through Hierarchical Tensor Partitioning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Resource-efficient neural architectures require compression techniques that retain functional capacity while reducing computational costs. Hierarchical tensor partitioning introduces a structured approach to model compression through the decomposition of high-dimensional tensors into hierarchical components, facilitating reductions in parameter storage without compromising essential representational structures. The empirical evaluation demonstrates that parameter counts decrease significantly across multiple model layers, leading to a reduction in memory consumption and inference latency. The hierarchical factorization strategy preserves linguistic coherence, as evidenced through perplexity analysis and sentence structure assessments, while computational efficiency improvements are reflected in reduced inference times and energy consumption. Variations in attention weight distributions and minor shifts in dependency retention indicate that compression influences representational expressivity, albeit within tolerable thresholds for practical deployment. The structured decomposition framework also introduces slight alterations in training dynamics, requiring adjustments to learning rate schedules to maintain convergence stability. Long-range dependency preservation and adversarial robustness assessments highlight the trade-offs inherent in hierarchical partitioning, revealing that efficiency gains come with marginal shifts in sensitivity to input perturbations. The experimental results suggest that hierarchical tensor partitioning provides a balance between compact model representation and computational efficiency, offering a viable compression strategy for architectures constrained by hardware limitations.

Article activity feed