It's All Connected: A Survey for Multimodal Arabic AI

Farizeh Aldabbas
Hossam Elsafty
Rafet Sifa

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Multimodal AI integrates text, vision, and speech within unified reasoning frameworks, yet Arabic remains significantly underrepresented due to diglossia, morphological complexity, and scarce multimodal resources. This survey delivers the first comprehensive technical roadmap for Arabic multimodal AI, covering the progression from unimodal Arabic NLP, OCR, and ASR to recent Arabic-capable Multimodal Large Language Models (MLLMs). We review available multimodal datasets, modality encoders, tokenization approaches, connector designs, and fusion strategies used in state-of-the-art systems. We also provide the first consolidated evaluation of Arabic-capable MLLMs on multimodal benchmarks ARB and PEARL analyzing performance, robustness, and domain generalization across OCR-grounded and open-domain VQA settings. Despite recent progress, challenges persist in cultural grounding, dialect inclusivity, dataset scale, and open-access ecosystem maturity. We outline actionable directions for scalable and culturally aligned Arabic multimodal intelligence, including parameter-efficient adaptation, broader corpus development, and unified evaluation protocols. By consolidating technical advances and empirical insights, this survey establishes a foundation to guide the next generation of Arabic-centric multimodal research.

Version published to 10.21203/rs.3.rs-8007923/v1 on Research Square
Nov 20, 2025

Advancing Dialectal Arabic to Modern Standard Arabic Machine Translation

This article has 3 authors:
1. Abdullah Alabdullah
2. Lifeng HAN
3. Chenghua Lin
This article has no evaluationsLatest version Jan 22, 2026
Addressing Challenges in Multimodal Large Language Model Development

This article has 4 authors:
1. Feidlimid Shyama
2. Lucas Pereira
3. João Souza
4. Ana Costa
This article has no evaluationsLatest version Dec 22, 2025
Agentic Sign Language: Balanced Evaluation and Adaptive Monitoring for Inclusive Multimodal Communication

This article has 2 authors:
1. Manish Shukla
2. Jithesh Yemi Reddy
This article has no evaluationsLatest version Dec 16, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Advancing Dialectal Arabic to Modern Standard Arabic Machine Translation

Addressing Challenges in Multimodal Large Language Model Development

Agentic Sign Language: Balanced Evaluation and Adaptive Monitoring for Inclusive Multimodal Communication