A Hybrid Rule-Based and Machine LearningMorphological Analyzerfor the Kangri Language Using UD Treebank

Prateek Kaushal

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Morphological analysis is a foundational task in natural language processing (NLP) and is particularly challenging for low-resourced and morphologically rich languages such as Kangri. Despite substantial numbers of speakers, Kangri lacks annotated corpora, computational tools, and lexicons, making linguistic analysis and downstream processing difficult. This paper presents a hybrid morphological analyzer for the Kangri language that integrates rule-based suffix analysis, lexicon extraction, and efficient machine learning models. A lexicon and suffix transformation rules were automatically induced from the Universal Dependencies (UD) Kangri Treebank. The rule-based morphological analyzer achieved an accuracy of 59\% on the UD test set. A machine learning baseline using TF--IDF character n-grams with Logistic Regression achieved 64.61\% accuracy, while an enhanced model incorporating POS tags improved performance to 67.40%. The results demonstrate that combining linguistic heuristics with statistical learning substantially improves lemma prediction and morphological interpretation for Kangri. This work establishes an initial computational morphology framework for Kangri and provides a foundation for further NLP tool development.

Version published to 10.21203/rs.3.rs-8299268/v1 on Research Square
Dec 31, 2025

Part-of-Speech Tagging for the Kangri Language Using CRF and BiLSTM Models: A Comprehensive Comparative Study

This article has 1 author:
1. Prateek Kaushal
This article has no evaluationsLatest version Jan 6, 2026
Integrating HPSG (Head-driven Phrase Structure Grammar) with Neural Parsing for Bengali

This article has 1 author:
1. Maneesha Rani Biswas
This article has no evaluationsLatest version Jan 16, 2026
Grammar-Driven Text Segmentationfor Context Understanding of Myanmar Language

This article has 3 authors:
1. myo thida
2. Nu Wei Thet
3. Thein Kyaw LWIN
This article has no evaluationsLatest version Jan 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Part-of-Speech Tagging for the Kangri Language Using CRF and BiLSTM Models: A Comprehensive Comparative Study

Integrating HPSG (Head-driven Phrase Structure Grammar) with Neural Parsing for Bengali

Grammar-Driven Text Segmentationfor Context Understanding of Myanmar Language