Optimizing Fake News Detection in Low-Resource Languages: A Comparative Study of Deep Learning Models Using Sentence-Level FastText Vectors in Kurdish and English
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The rapid dissemination of misinformation on social media is a growing concern. This concern hit languages like Kurdish hard, whose fewer resources created problems in not identifying and understanding the issue. This work made use of deep learning (DL) techniques—Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), and Convolutional Neural Networks (CNN)—for misinformation detection in Kurdish and English languages. These DL techniques were tested on the Kurdish Dataset for Fake News Detection (KDFND). Among the three models, CNN achieved the highest accuracy (97.40% for Kurdish). We also examined how FastText embeddings affect performance by comparing models with and without embedding layers. Aim for the highest accuracy and the fastest model. Our tests indicate that FastText models for sentence-level vectors (without embedding layers) perform much better, with almost 97 percent accuracy for the Kurdish and 96 percent for the English. On the other hand, pretrained-embedding models only attain about 50 percent accuracy. The results demonstrate the limitations of static embeddings in low-resource settings and show that flexible, simple models are able to detect fake news without much pretraining. The present research contributes toward further development of NLP techniques for low-resource languages while having practical implications for multilingual fake news detection systems.