Language-specific embeddings of Old English with character-level processing

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This article seeks to contribute to narrowing the gap between philological research and computational linguistics by providing a neural network model for Old English analysis. The article makes three key innovations: (i) the development of language-specific word embeddings derived directly from The Dictionary of Old English Corpus ; (ii) a comparative analysis of character-level versus word-level models that demonstrates the superior performance of character-level processing for morphologically rich historical languages; and (iii) a comprehensive Stanza-based pipeline that outperforms previous approaches to Old English parsing. Our model achieves 88.92% Unlabeled Attachment Score and 79.65% Labeled Attachment Score on dependency parsing tasks, which represents approximately 20 percentage point improvement over previous state-of-the-art multilingual approaches. The main conclusion of this work is that language-specific resources and character-level modeling are more effective for Old English processing than cross-linguistic transfer learning. This opens new avenues for computational research in historical linguistics and digital humanities.

Article activity feed