Enhancing Multimodal Recommendation via Contrastive Self-Supervised Modality-Preserving Learning

Jiajie Lu
Yamashita Haruka

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Multimodal recommendation systems have gained increasing attention for their ability to incorporate rich side information such as visual and textual features. However, a critical yet underexplored challenge is the insufficient preservation of modality-specific information during training, which can weaken the effectiveness of multimodal signals and limit recommendation accuracy. To address this limitation, we propose Contrastive Modality-Preserving Learning (CMPL), a novel framework that extends the state-of-the-art MONET architecture. CMPL introduces a before-and-after contrastive learning module that explicitly maximizes the mutual information between initial modality embeddings and their final representations, thereby ensuring stronger modality preservation. At the same time, a graph convolutional backbone captures high-order collaborative signals from the user–item interaction graph, while a target-aware attention mechanism adaptively emphasizes user preference patterns. This joint design allows CMPL to balance the preservation of modality cues with the exploitation of collaborative filtering signals. We conduct extensive experiments on two real-world Amazon datasets, Office and MenClothing, and results consistently show that CMPL outperforms competitive baselines, including MARIO and MONET, in terms of precision and recall. These findings validate both the effectiveness of our approach and further highlight the necessity of explicitly modeling modality preservation for robust multimodal recommendation.

Version published to 10.21203/rs.3.rs-7555258/v1 on Research Square
Oct 27, 2025

Structure-Activated and Interest-Aware Multimodal Recommendation Method

This article has 3 authors:
1. HaoYu Wang
2. HongBin Xia
3. XiaoFeng Wang
This article has no evaluationsLatest version Oct 16, 2025
Enhancing Cross-Modal Retrieval via Label Graph Optimization and Hybrid Loss Functions

This article has 3 authors:
1. Lin Wang
2. Chenchen Wang
3. Simin Peng
This article has no evaluationsLatest version Nov 11, 2025
Cross-Modal Local Interest Contrast with Dual-Graph Denoising for Multimodal Recommendation

This article has 6 authors:
1. Yuxin Qi
2. Quangui Zhang
3. Xinqiang Ma
4. Xie Feng
5. Qiang Li
6. Yi Huang
This article has no evaluationsLatest version Sep 26, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Structure-Activated and Interest-Aware Multimodal Recommendation Method

Enhancing Cross-Modal Retrieval via Label Graph Optimization and Hybrid Loss Functions

Cross-Modal Local Interest Contrast with Dual-Graph Denoising for Multimodal Recommendation