Parsing Old English with Universal Dependencies—The Impacts of Model Architectures and Dataset Sizes

Javier Martín Arista
Ana Elvira Ojanguren López
Sara Domínguez Barragán

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study presents the first systematic empirical comparison of neural architectures for Universal Dependencies (UD) parsing in Old English, thus addressing central questions in computational historical linguistics and low-resource language processing. We evaluate three approaches—a baseline spaCy pipeline, a pipeline with a pretrained tok2vec component, and a MobileBERT transformer-based model—across datasets ranging from 1000 to 20,000 words. Our results demonstrate that the pretrained tok2vec model consistently outperforms alternatives, because it achieves 83.24% UAS and 74.23% LAS with the largest dataset, whereas the transformer-based approach substantially underperforms despite higher computational costs. Performance analysis reveals that basic tagging tasks reach 85–90% accuracy, while dependency parsing achieves approximately 75% accuracy. We identify critical scaling thresholds, with substantial improvements occurring between 1000 and 5000 words and diminishing returns beyond 10,000 words, which provides insights into scaling laws for historical languages. Technical analysis reveals that the poor performance of the transformer stems from parameter-to-data ratio mismatches (1250:1) and the unique orthographic and morphological characteristics of Old English. These findings defy assumptions about transformer superiority in low-resource scenarios and establish evidence-based guidelines for researchers working with historical languages. The broader significance of this study extends to enabling an automated analysis of three million words of extant Old English texts and providing a framework for optimal architecture selection in data-constrained environments. Our results suggest that medium-complexity architectures with monolingual pretraining offer superior cost–benefit trade-offs compared to complex transformer models for historical language processing.

Version published to 10.3390/bdcc9080199
Jul 30, 2025
Version published to 10.20944/preprints202505.2373.v1
May 29, 2025

Comparing Neural Architectures for English-Spanish Machine Translation: From LSTM to Transformer

This article has 6 authors:
1. Jingyuan Zhu
2. Anbang Chen
3. Bowen Wang
4. Sining Huang
5. Yukun Song
6. Yixiao Kang
This article has no evaluationsLatest version Jan 6, 2026
Integrating HPSG (Head-driven Phrase Structure Grammar) with Neural Parsing for Bengali

This article has 1 author:
1. Maneesha Rani Biswas
This article has no evaluationsLatest version Jan 16, 2026
Part-of-Speech Tagging for the Kangri Language Using CRF and BiLSTM Models: A Comprehensive Comparative Study

This article has 1 author:
1. Prateek Kaushal
This article has no evaluationsLatest version Jan 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Comparing Neural Architectures for English-Spanish Machine Translation: From LSTM to Transformer

Integrating HPSG (Head-driven Phrase Structure Grammar) with Neural Parsing for Bengali

Part-of-Speech Tagging for the Kangri Language Using CRF and BiLSTM Models: A Comprehensive Comparative Study