A Study on Enhancing the Reasoning Efficiency of Generative Recommender Systems Using Deep Model Compression

Hong Peng
Xiaoliang Jin
Qiao Huang
Supeng Liu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In order to improve the inference efficiency of generative recommendation systems in practical applications, a deep model compression framework integrating structured pruning, dynamic quantization, and knowledge distillation is constructed to study the comprehensive impact of multi strategy synergy on model volume, response delay, and recommendation accuracy. Using publicly available datasets such as MovieLens-1M and Alibaba Tianchi as benchmarks, analyze the performance and system bottlenecks under different compression combinations. By constructing a hybrid parallel scheduling mechanism and multi-level cache optimization system, enhance the execution capability of the compression model in the inference stage. The results show that the joint compression of the three strategies can reduce the inference delay to 15.3ms while compressing the model parameters to 22.8% of the original, HR@10 Only a 0.4% decrease has been achieved, significantly improving operational efficiency and deployment adaptability.

Version published to 10.21203/rs.3.rs-7531416/v1 on Research Square
Sep 5, 2025

Deep Neural Architecture Combining Frequency and Attention Mechanisms for Cloud CPU Usage Prediction

This article has 5 authors:
1. Ming Wang
2. Sibo Wang
3. Yilin Li
4. Ziyu Cheng
5. Song Han
This article has no evaluationsLatest version Sep 23, 2025
Hyper-Heuristic Recommender System to Balance Accuracy, Diversity, and Fairness

This article has 3 authors:
1. Ansam Al-gburi
2. Mohamed Bader-El-Den
3. Ramazan Esmeli
This article has no evaluationsLatest version Aug 20, 2025
Music Content Understanding Models forPersonalized Recommendation Systems

This article has 1 author:
1. Li Jing
This article has no evaluationsLatest version Sep 30, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Deep Neural Architecture Combining Frequency and Attention Mechanisms for Cloud CPU Usage Prediction

Hyper-Heuristic Recommender System to Balance Accuracy, Diversity, and Fairness

Music Content Understanding Models forPersonalized Recommendation Systems