Evaluating Multilingual and Arabic Large Language Models for Quranic QA

Zakia Saadaoui
Ghassen Tlig
Fethi Jarray

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The emergence of large language models (LLMs) has created new opportunities for interacting with religious texts. An important research question is whether broad, multilingual LLMs (e.g., GPT-4, LaLMa,), equipped with extensive world knowledge, surpass smaller, domain-specific LLMs that are carefully trained on Arabic language data and Quranic sources.This paper explores the comparative performance of multilingual and Arabic-specific Large Language Models on the task of Quranic Question Answering through a prompt engineering–based evaluation framework.Our experiments show that, while Arabic-specific language models have strong linguistic foundations and a good grasp of classical Arabic morphology, large multilingual models consistently outperform them in terms of factual recall, cross-linguistic inference, and complex reasoning. This superiority stems from their broader exposure to diverse multilingual corpora, richer semantic representations, and reasoning capabilities tailored to the instructions. In the context of Quranic question-answering, multilingual LLMs generate more coherent, contextually appropriate, and semantically accurate responses, often outperforming Arabic models in terms of generative quality. Therefore, the results argue in favor of using multilingual LLMs, possibly supplemented by fine tuning the Arabic ones rather than relying exclusively on specific domain.

Version published to 10.21203/rs.3.rs-8001155/v1 on Research Square
Nov 20, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed