ASRD: Development and Validation of a Large-Scale Arabic Semantic Relation Dataset

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper presents the development and validation of the Arabic Semantic Relations Dataset (ASRD), a large-scale and high-quality lexical resource designed to support research in Arabic lexical semantics. ASRD addresses the lack of robust, publicly available Arabic datasets annotated for semantic relations, especially hypernymy. It was built by aggregating and aligning data from multiple Arabic lexical sources, followed by extensive cleaning, annotation, and validation processes. Validation was conducted in collaboration with expert Arabic linguists. We describe the sources, extraction pipeline, structural characteristics, and validation strategy, and demonstrate the dataset’s utility in downstream NLP tasks such as hypernymy detection and hypernymy directionality. The dataset is publicly available at Zenodo https://doi.org/10.5281/zenodo.15486725.

Article activity feed