An Empirical Comparison of Ensemble model and Deep Learning Models for Multi-Level Arabic Fake News Classification using JoNewsFake Dataset

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The public's trust and the integrity of information are at risk by the spread of fake news on Arabic social media platforms. Arabic is still underrepresented in multi-label multi-level fake news detection because of its linguistic complexity and lack of resources, despite significant efforts being made in English. This study provides a full comparison of machine learning and deep learning models for multi-label, multi-level Arabic fake news classification. Using the newly constructed JoNewsFake dataset collected from verified Jordanian news agencies, the models were trained to classify news into main categories, subcategories, and fake/real labels. Experiments included traditional ML classifiers (Random Forest, Extra Trees, and LightGBM) and advanced DL models (CNN + Bi-LSTM, CNN + Bi-GRU, fine-tuned CNN variants, and Transformer-based). Results showed that the Extra Trees classifier outperformed all ML and DL models, achieving F1-scores of 95% (Main Category), 98% (Subcategory), and 95% (Fake/Real). Among the DL models, the transformer-based model yielded the best performance for subcategory classification F1 at 88.2%, while CNN + Bi-GRU achieved 93% accuracy in binary fake/real classification. The study confirms the value of combining syntactic, semantic, and emotional features (Tuned AraBERT) to boost classification performance, particularly for complex subcategory tasks. These findings lay the groundwork for future enhancements in Arabic fake news detection using explainable and scalable models.

Article activity feed