Detecting Machine-Generated Arabic Text Using AraBERT and LSTM: Toward Trustworthy NLP in Low-Resource Languages

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deepfake text generation has emerged as a serious challenge in the age of advanced language models, particularly in low-resource languages like Arabic. This study presents a deep learning-based approach to detect synthetic Arabic text generated by AI systems. We propose a binary classification framework combining AraBERT embeddings with a Long Short-Term Memory (LSTM) network. A balanced dataset of 87,452 samples was constructed using real Arabic text and synthetic text generated via AraGPT2. Our best-performing model achieved a test accuracy of 99.5%, demonstrating strong generalization and detection capability. This work contributes to enhancing Arabic NLP security and offers a foundation for future multilingual deepfake detection systems.

Article activity feed