Detecting Machine-Generated Arabic Text Using AraBERT and LSTM: Toward Trustworthy NLP in Low-Resource Languages
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Deepfake text generation has emerged as a serious challenge in the age of advanced language models, particularly in low-resource languages like Arabic. This study presents a deep learning-based approach to detect synthetic Arabic text generated by AI systems. We propose a binary classification framework combining AraBERT embeddings with a Long Short-Term Memory (LSTM) network. A balanced dataset of 87,452 samples was constructed using real Arabic text and synthetic text generated via AraGPT2. Our best-performing model achieved a test accuracy of 99.5%, demonstrating strong generalization and detection capability. This work contributes to enhancing Arabic NLP security and offers a foundation for future multilingual deepfake detection systems.