Multi-modal News Recommendation with Deep Text Modeling and Multi‑View Image Fusion
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In the digital media era, the sheer volume of news content has led to information overload, making it challenging for users to find relevant articles efficiently. Existing methods often rely on collaborative filtering and click histories but lack deep semantic modeling of news text, overlooking both local and global word dependencies. Static attention mechanisms further limit the ability to dynamically highlight context‑specific key information, while most multimodal systems extract image features from a single perspective, failing to capture the full visual context. To overcome these limitations, we propose M‑GRU‑DHA (Multi‑view GRU with Dynamic Hybrid Attention), which integrates two modules: GRU‑DHA for text and MV‑NPIC (Multi‑View News Picture Information Capturing) for images. GRU‑DHA combines GRU to model global dependencies and CNN to extract local semantics, augmented by a dynamic hybrid attention mechanism that adaptively focuses on key words. MV‑NPIC enriches multimodal fusion by extracting cover‑image features from multiple viewpoints, thereby capturing comprehensive visual semantics. Extensive experiments on real‑world datasets demonstrate that M‑GRU‑DHA consistently outperforms mainstream methods on standard evaluation metrics, validating the effectiveness of our approach for personalized news recommendation.