A Hybrid Rule-Based and Machine LearningMorphological Analyzerfor the Kangri Language Using UD Treebank
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Morphological analysis is a foundational task in natural language processing (NLP) and is particularly challenging for low-resourced and morphologically rich languages such as Kangri. Despite substantial numbers of speakers, Kangri lacks annotated corpora, computational tools, and lexicons, making linguistic analysis and downstream processing difficult. This paper presents a hybrid morphological analyzer for the Kangri language that integrates rule-based suffix analysis, lexicon extraction, and efficient machine learning models. A lexicon and suffix transformation rules were automatically induced from the Universal Dependencies (UD) Kangri Treebank. The rule-based morphological analyzer achieved an accuracy of 59\% on the UD test set. A machine learning baseline using TF--IDF character n-grams with Logistic Regression achieved 64.61\% accuracy, while an enhanced model incorporating POS tags improved performance to 67.40%. The results demonstrate that combining linguistic heuristics with statistical learning substantially improves lemma prediction and morphological interpretation for Kangri. This work establishes an initial computational morphology framework for Kangri and provides a foundation for further NLP tool development.