Multimodal Social Media Fake News Detection Using RoBERTa and Vision Transformer Encoders with Reliability Aware Adaptive Fusion

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rapid rise of multimodal content on social media has increased the technological complexity of detecting fake news. Deceptive posts frequently pair seemingly credible textual narratives with contextually irrelevant or manipulated images, posing significant societal risks and highlighting the need for robust, automated detection mechanisms. While recent multimodal fake news detection models such as EANN, SpotFake, MCAN, LIIMR, MFND-CMM, PFBL, and TMEF-BI have shown notable improvements, many existing approaches still rely on shallow fusion strategies, suffer from modality dominance, or assume equal reliability of textual and visual information across all samples. This work introduces RoViT-Detect, a cross-modal framework that jointly models textual, visual, and behavioral features using RoBERTa and ViT-based encoders, integrating complementary cues through late fusion and an MLP classifier for effective multimodal fake news detection. Furthermore, motivated by reliability-aware and uncertainty-aware fusion frameworks such as PFBL and TMEF-BI, an adaptive fusion strategy is incorporated to mitigate the influence of less informative modalities and reduce modality dominance during training. Experiments conducted on two widely used benchmark datasets- Twitter (English) and Weibo (Chinese), covering noisy, heterogeneous, and event-driven social media environments, demonstrate that the proposed approach consistently outperforms state-of-the-art multimodal methods. The RoViT-Detect model achieves 98.34% accuracy on the Weibo dataset and 99.82% accuracy on the Twitter dataset. These results confirm that explicitly modelling cross-modal interactions and modality reliability leads to more robust and reliable fake news detection in dynamic social media environments, establishing RoViT-Detect as a strong and scalable framework for real-world social media applications.

Article activity feed