FaseehGPT: A Lightweight Transformer Model for Arabic Text Generation with Enhanced Morphological Understanding
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We present FaseehGPT, a specialized transformer-based language model designed for high-quality Arabic text generation in resource-constrained environments. Unlike existing Arabic language models that primarily focus on understanding tasks, FaseehGPT is optimized for generative applications while maintaining computational efficiency suitable for deployment on consumer-grade hardware. The model employs a decoder-only transformer architecture with 70.7 million parameters, trained on a carefully curated corpus of 8.7 million Arabic texts spanning colloquial tweets, formal news articles, and classical literature. Our approach leverages the morphological richness of Arabic through strategic tokenization using a pre-trained Arabic BERT tokenizer, enabling effective handling of the language’s complex derivational and inflectional patterns. Extensive evaluation demonstrates FaseehGPT’s capability to generate coherent, contextually appropriate text across multiple Arabic varieties and registers. The model achieves competitive performance while requiring significantly fewer computational resources than comparable systems, with training completed on a single NVIDIA T4 GPU. We provide comprehensive technical details, reproducible training procedures, and make the complete model and codebase publicly available to advance Arabic NLP research. Evaluation metrics show consistent improvement across training epochs, with final perplexity scores indicating strong language modeling performance comparable to larger models in the Arabic domain. https://huggingface.co/alphatechlogics/FaseehGPT