QNLP-Bench: A Standardized Benchmark and Evaluation Framework for Quantum Natural Language Processing
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Quantum Natural Language Processing (QNLP) seeks to combine com positional models of meaning with quantum and quantum-inspired computational frameworks. While recent work has demonstrated the feasibility of implement ing such models, empirical results remain difficult to compare due to heteroge neous evaluation practices, weak or mismatched baselines, and limited reporting of computational resources. As a result, it is often unclear when compositional or quantum-inspired approaches are genuinely warranted. In this work, we introduce QNLP-Bench, a standardized benchmark and evalu ation framework designed to enable fair, reproducible, and compute-aware assess ment of QNLP models. The benchmark defines shared task specifications, fixed preprocessing pipelines, capacity-matched classical baselines, and explicit resource reporting requirements across classical, quantum-inspired, and quantum models. Rather than targeting claims of quantum advantage, QNLP-Bench aims to clar ify the empirical conditions under which compositional structure contributes to improved generalization. We provide a minimal but diagnostic experimental instantiation of the bench mark using controlled sentence classification tasks that separate lexically driven and role-sensitive regimes. The results demonstrate that simple baselines suffice on lexically separable tasks, while explicit encoding of grammatical structure is nec essary for success on role-dependent problems, albeit at increased computational cost. These findings illustrate the value of standardized, diagnostic benchmarks for grounding progress in QNLP and for guiding future research toward transparent and principled evaluation practices