ASRD: Development and Validation of a Large-Scale Arabic Semantic Relation Dataset

Randah Alharbi
Tarek Helmy
Atika Al-Saghyir
Safa Aglan
Abdulrahman Alosaimy
Husni Al-Muhtaseb

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper presents the development and validation of the Arabic Semantic Relations Dataset (ASRD), a large-scale and high-quality lexical resource designed to support research in Arabic lexical semantics. ASRD addresses the lack of robust, publicly available Arabic datasets annotated for semantic relations, especially hypernymy. It was built by aggregating and aligning data from multiple Arabic lexical sources, followed by extensive cleaning, annotation, and validation processes. Validation was conducted in collaboration with expert Arabic linguists. We describe the sources, extraction pipeline, structural characteristics, and validation strategy, and demonstrate the dataset’s utility in downstream NLP tasks such as hypernymy detection and hypernymy directionality. The dataset is publicly available at Zenodo https://doi.org/10.5281/zenodo.15486725.

Version published to 10.21203/rs.3.rs-7602223/v1 on Research Square
Dec 10, 2025

Ekantipur-15Y: A Longitudinal Benchmark Corpus and Semantic Analysis of Nepali News (2010 - 2025)

This article has 2 authors:
1. Diwash Mainali
2. Utsav Mainali
This article has no evaluationsLatest version Mar 3, 2026
A diagnostic and evaluative analysis of PARSEME corpora complexity

This article has 3 authors:
1. Santiago Fernández Lanza
2. Víctor Manuel Darriba Bilbao
3. Daniel Fernández-González
This article has no evaluationsLatest version Mar 30, 2026
Relation Extraction (RE) Model for Afaan Oromo Text Using Self-Attention Mechanisms

This article has 1 author:
1. Lingerew Bantie
This article has no evaluationsLatest version Feb 26, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Ekantipur-15Y: A Longitudinal Benchmark Corpus and Semantic Analysis of Nepali News (2010 - 2025)

A diagnostic and evaluative analysis of PARSEME corpora complexity

Relation Extraction (RE) Model for Afaan Oromo Text Using Self-Attention Mechanisms