Part-of-Speech Tagging for the Kangri Language Using CRF and BiLSTM Models: A Comprehensive Comparative Study

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Part-of-Speech (POS) tagging is a core task in natural language processing (NLP) and a crucial building block for higher-level applications such as parsing and machine translation. For low-resource and morphologically rich languages such as Kangri, POS tagging remains challenging due to scarce annotated corpora and limited linguistic resources. This paper presents a comparative study of three POS tagging approaches for Kangri: a feature-based Conditional Random Field (CRF), an untuned Bidirectional Long Short-Term Memory (BiLSTM) baseline, and a hyperparameter-tuned BiLSTM. All models are trained and evaluated on the Universal Dependencies (UD) Kangri Treebank. The tuned CRF achieves strong test-set performance (70.4\% accuracy and weighted F1 0.695), the untuned BiLSTM provides a robust neural baseline (66.0\% accuracy), and a hyperparameter-optimized BiLSTM reaches higher validation accuracy during tuning. We analyze per-tag strengths and weaknesses, training dynamics, and provide recommendations for future improvements in low-resource POS tagging.

Article activity feed