ArDQA: A Parallel Multidomain Benchmark for Cross-Dialectal Arabic Question Answering

Maha Jarallah Althobaiti

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Arabic question answering (QA) has advanced for Modern Standard Arabic (MSA) due to benchmark datasets, mostly created via automatic translation from English. Dialectal Arabic QA remains behind due to a lack of benchmarks and reliable machine translation. We introduce ArDQA, the first parallel extractive QA benchmark covering five Arabic varieties (Egyptian, Gulf, Levantine, Maghrebi and MSA) across three domains (SQuAD, Vlogs, Narratives). resulting in a total of 8,150 QA examples. We frame ArDQA as a resource for studying transfer, domain shift, and robustness across Arabic varieties. We employ a two-stage pipeline: native speakers translate contexts and questions from a source variety into target varieties, then manually annotate answer spans with strict one-to-one alignment. Quality assurance includes peer review, expert adjudication, and consistency checks based on answer-to-context length ratios. We also provide a comprehensive analysis of ArDQA characteristics. To establish baselines, we examine zero-shot cross-dialect transfer, using MSA-trained QA models to answer questions in other dialects. To this end, we fine-tune three MSA-trained transformers (AraELECTRA, CAMeLBERT-MSA, and AraBERT) on an MSA QA dataset and evaluate zero-shot transfer. Without any dialectal fine-tuning, AraELECTRA achieves macro F1/Exact Match scores of 70.94/57.34 on ArDQA-SQuAD, 63.10/38.69 on ArDQA-Vlogs, and 38.08/11.87 on ArDQA-Narratives, while the other models exhibit lower performance. Generalized zero-shot experiments, where the context and question come from different dialects, show similar degradation. These results highlight the need for dialect-specific resources and adaptation strategies to improve QA robustness across Arabic varieties. ArDQA thus serves as a reference resource for future work on cross-dialectal Arabic QA.

Version published to 10.21203/rs.3.rs-8007777/v1 on Research Square
Nov 10, 2025

TARGAMA: A Novel Benchmark Dataset and Framework for Translating Dialectal Arabic to English with Generative Language Models

This article has 6 authors:
1. Bouthaina Abdou
2. Hossam Elsafty
3. Farizeh Aldabbas
4. Maren Pielka
5. Rafet Sifa
6. Lucie Flek
This article has no evaluationsLatest version Nov 20, 2025
Evaluating Multilingual and Arabic Large Language Models for Quranic QA

This article has 3 authors:
1. Zakia Saadaoui
2. Ghassen Tlig
3. Fethi Jarray
This article has no evaluationsLatest version Nov 20, 2025
A Hybrid Machine Translation Framework for Low-Resource Indian Languages Using Differential Programming Loss Optimization

This article has 4 authors:
1. Rituraj Dixit
2. Sarabjeet Singh Bedi
3. Ibrahim Aljubayri
4. Mohammad Zubair Khan
This article has no evaluationsLatest version Oct 1, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

TARGAMA: A Novel Benchmark Dataset and Framework for Translating Dialectal Arabic to English with Generative Language Models

Evaluating Multilingual and Arabic Large Language Models for Quranic QA

A Hybrid Machine Translation Framework for Low-Resource Indian Languages Using Differential Programming Loss Optimization