Leveraging Character Stitching and Generative AI to improve Kannada Handwritten Text Recognition

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Kannada is one of the many low-resource languages in the world. It is a language spoken predom- inantly in the state of Karnataka in India. Although it has more than 44 million native speakers, like other Indian languages, it falls behind in research and resources compared to Western languages such as English. Handwriting Recognition is a crucial area in Natural Language Processing (NLP). The Kannada script is highly complex due to the many diacritics present in the language, making manual data collection a difficult task. Generative AI has recently seen a massive boom, especially in image generation tasks with models like Autoencoders, Generative Adversarial Network (GAN), and Diffusion models. In this research, a novel application of Character Stitching and StarGAN is being proposed. StarGAN, an Image-to-Image GAN model, is used to synthetically generate Kan- nada kagunita images from the base character. Character stitching is used to create a synthetic word dataset from existing character dataset. This approach has shown promising results with character stitching alone improving the recognition accuracy of simple Kannada words from 43.27% to 87.13% and more complex words using StarGAN from 45.59% to 79.12%.

Article activity feed