Evaluating Multilingual and Arabic Large Language Models for Quranic QA
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The emergence of large language models (LLMs) has created new opportunities for interacting with religious texts. An important research question is whether broad, multilingual LLMs (e.g., GPT-4, LaLMa,), equipped with extensive world knowledge, surpass smaller, domain-specific LLMs that are carefully trained on Arabic language data and Quranic sources.This paper explores the comparative performance of multilingual and Arabic-specific Large Language Models on the task of Quranic Question Answering through a prompt engineering–based evaluation framework.Our experiments show that, while Arabic-specific language models have strong linguistic foundations and a good grasp of classical Arabic morphology, large multilingual models consistently outperform them in terms of factual recall, cross-linguistic inference, and complex reasoning. This superiority stems from their broader exposure to diverse multilingual corpora, richer semantic representations, and reasoning capabilities tailored to the instructions. In the context of Quranic question-answering, multilingual LLMs generate more coherent, contextually appropriate, and semantically accurate responses, often outperforming Arabic models in terms of generative quality. Therefore, the results argue in favor of using multilingual LLMs, possibly supplemented by fine tuning the Arabic ones rather than relying exclusively on specific domain.