MLLM4Rec : Multimodal Information Enhancing LLM for Sequential Recommendation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In recent years, With the advent of large language models (LLMs) such as GPT-4, LLaMA, and ChatGLM, leveraging multimodal information (e.g., images and audio) to enhance recommendation systems has become possible. To further enhance the performance of recommendation systems based on large language models (LLMs), we propose MLLM4Rec, a sequence recommendation framework grounded in LLMs. Specifically, our approach integrates multimodal information, with a focus on image data, into LLMs to improve recommendation accuracy. By employing a hybrid prompt learning mechanism combined with role-playing for model fine-tuning, MLLM4Rec effectively bridges the gap between textual and visual representations, enabling text-based LLMs to "read" and interpret images. Moreover, the fine-tuned LLM is utilized to rank retrieval candidates, thereby maintaining its generative capabilities while optimizing item ranking according to user preferences. Extensive experiments were conducted on three publicly available benchmark datasets to evaluate the proposed method. The results demonstrate that MLLM4Rec outperforms traditional sequence recommendation models and pre-trained multimodal models in terms of NDCG, MRR, and Recall metrics.

Article activity feed