Improving Arabic Clinical Question Quality through Domain-Adaptive Masked Language Modeling

Walid Ounachad
Mohamed Khenchouch
Imad Zeroual
Yousef Farhaoui

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Arabic clinical NLP systems often receive short, vague, or incomplete questions, which yields weak downstream answers even with strong encoders. We address this bottleneck by making question quality a first-class, measurable objective. Using domain-adaptive (continued) pretraining with a masked-language objective (DAPT-MLM) on AHQAD (~ 808k Arabic health Q–A pairs), we adapt two widely used backbones—AraBERT and the generator variant of AraELECTRA—to the lexical, syntactic, and discourse patterns of well-formed medical questions. Evaluation is aligned with the learning signal: we report cross-entropy and perplexity only at masked tokens, top-k accuracy restricted to masked spans, and lexical-diversity measures to discourage formulaic phrasing. A length-controlled test design (Short/Long/Very Long) isolates modeling gains from verbosity. Results show consistent intrinsic improvements for the domain-adapted models; AraBERT-MLM is best overall (macro Top-5 = 0.8392, lowest CE/PPL), outperforming AraBERT (orig.) by + 6.0 pp Top-5 and AraELECTRA (orig.) by + 17.2 pp. A 200-item human study (clinician + linguist) corroborates these gains (mean ± 95% CI: Clarity 4.12 ± 0.18, Fluency 3.68 ± 0.22, Semantic Fidelity 3.15 ± 0.25, Usefulness 3.42 ± 0.21; substantial agreement, κ ≈ 0.77) and highlights residual semantic drifts that inform simple, slot-constrained decoding fixes. Overall, the proposed reformulation module produces more natural and clinically relevant Arabic questions and can be plugged into Arabic clinical QA pipelines as a measurable, tunable front-end.

Version published to 10.21203/rs.3.rs-8007820/v1 on Research Square
Nov 19, 2025

Entity-centric evaluation of large language model responses for medical question-answering tasks

This article has 2 authors:
1. Yi Liu
2. Vijaya B. Kolachalama
This article has no evaluationsLatest version Nov 14, 2025
Assessing the Capability of Large Language Models in Answering Pediatric Critical Care Board-Style Questions

This article has 10 authors:
1. Daniela Chanci
2. Ronald Moore
3. Henry P. Foote
4. Matthew A. Goldstein
5. Karan R. Kumar
6. Alexandre T. Rotta
7. Christoph P. Hornik
8. Marybeth Burriss-West
9. Makenzie Hamilton
10. Rishikesan Kamaleswaran
This article has no evaluationsLatest version Nov 3, 2025
Optimizing discharge summary generation: fine-tuning LLMs by DoRA and iterative self-evaluation for enhanced medical text generation

This article has 5 authors:
1. Wenbin Li
2. Hui Feng
3. Chao Hu
4. Minpeng Xu
5. Longlong Cheng
This article has no evaluationsLatest version Nov 4, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Entity-centric evaluation of large language model responses for medical question-answering tasks

Assessing the Capability of Large Language Models in Answering Pediatric Critical Care Board-Style Questions

Optimizing discharge summary generation: fine-tuning LLMs by DoRA and iterative self-evaluation for enhanced medical text generation