ArDQA: A Parallel Multidomain Benchmark for Cross-Dialectal Arabic Question Answering

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Arabic question answering (QA) has advanced for Modern Standard Arabic (MSA) due to benchmark datasets, mostly created via automatic translation from English. Dialectal Arabic QA remains behind due to a lack of benchmarks and reliable machine translation. We introduce ArDQA, the first parallel extractive QA benchmark covering five Arabic varieties (Egyptian, Gulf, Levantine, Maghrebi and MSA) across three domains (SQuAD, Vlogs, Narratives). resulting in a total of 8,150 QA examples. We frame ArDQA as a resource for studying transfer, domain shift, and robustness across Arabic varieties. We employ a two-stage pipeline: native speakers translate contexts and questions from a source variety into target varieties, then manually annotate answer spans with strict one-to-one alignment. Quality assurance includes peer review, expert adjudication, and consistency checks based on answer-to-context length ratios. We also provide a comprehensive analysis of ArDQA characteristics. To establish baselines, we examine zero-shot cross-dialect transfer, using MSA-trained QA models to answer questions in other dialects. To this end, we fine-tune three MSA-trained transformers (AraELECTRA, CAMeLBERT-MSA, and AraBERT) on an MSA QA dataset and evaluate zero-shot transfer. Without any dialectal fine-tuning, AraELECTRA achieves macro F1/Exact Match scores of 70.94/57.34 on ArDQA-SQuAD, 63.10/38.69 on ArDQA-Vlogs, and 38.08/11.87 on ArDQA-Narratives, while the other models exhibit lower performance. Generalized zero-shot experiments, where the context and question come from different dialects, show similar degradation. These results highlight the need for dialect-specific resources and adaptation strategies to improve QA robustness across Arabic varieties. ArDQA thus serves as a reference resource for future work on cross-dialectal Arabic QA.

Article activity feed