Part-of-Speech Tagging for the Kangri Language Using CRF and BiLSTM Models: A Comprehensive Comparative Study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Part-of-Speech (POS) tagging is a core task in natural language processing (NLP) and a crucial building block for higher-level applications such as parsing and machine translation. For low-resource and morphologically rich languages such as Kangri, POS tagging remains challenging due to scarce annotated corpora and limited linguistic resources. This paper presents a comparative study of three POS tagging approaches for Kangri: a feature-based Conditional Random Field (CRF), an untuned Bidirectional Long Short-Term Memory (BiLSTM) baseline, and a hyperparameter-tuned BiLSTM. All models are trained and evaluated on the Universal Dependencies (UD) Kangri Treebank. The tuned CRF achieves strong test-set performance (70.4\% accuracy and weighted F1 0.695), the untuned BiLSTM provides a robust neural baseline (66.0\% accuracy), and a hyperparameter-optimized BiLSTM reaches higher validation accuracy during tuning. We analyze per-tag strengths and weaknesses, training dynamics, and provide recommendations for future improvements in low-resource POS tagging.