From Norm to Usage: Computational Insights into Genitive–Accusative Alternation in Ukrainian and Russian

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

While sharing a common Slavic origin, Ukrainian and Russian exhibit both grammatical similaritiesand significant differences, particularly in noun morphology and vocabulary. This study investigatesthe representation of Ukrainian noun morphology, focusing on grammatical case and number, withindistributional semantic vector spaces (fastText embeddings). We pay special attention to the complexrelationship between the Accusative (Acc) and Genitive (Gen) cases, known for frequent alternationand syncretism in Ukrainian. Using t-SNE for visualization and DBSCAN for clustering on a datasetderived from the GRAC corpus, we analyze the structure of the vector space for Ukrainian nouns,comparing findings to previous work on Russian and conducting a direct comparison using translationalequivalents. Our results show that while grammatical number provides a relatively strong organizingprinciple for Ukrainian nouns, case representation is less distinct, with substantial Acc/Gen overlapobserved across multiple semantic classes, reflecting the pervasiveness of linguistic alternation rules (e.g., partitive, animacy-based, negation). A focused DBSCAN analysis of the Acc/Gen overlap zone revealed mostly diffuse similarity (”noise”) but also distinct dense clusters comprising primarily AccSg, GenSg, and GenPl forms, notably excluding AccPl forms. Furthermore, a comparative t-SNE visualizationshowed that Russian translational counterparts of the Ukrainian nouns from these dense clusters exhibit a similarly high degree of Acc/Gen mixing. The findings highlight both potentially language-specific organizational principles (number dominance) and shared distributional tendencies between Ukrainian and Russian in representing Acc/Gen functional overlap, suggesting common underlying semantic factors are captured despite finer-grained grammatical differences.

Article activity feed