A Modified Vision Transformer for Kurdish Cursive RTL Handwritten Text Recognition

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The problem of handwritten text recognition (HTR) of low-resource cursive scripts is a major problem in document image analysis. This research aims to bridge the gap of unavailable annotated data and effective text recognition systems for Central Kurdish, also known as Sorani, which is a complex script consisting of 34 letters, rich ligatures, and context-dependent diacritical marks, and it is the first time this script is studied in HTR research like our approach. In this research, we present DASNUS, a new large-scale dataset of 11,475 annotated text lines from 867 writers across the Kurdistan region of Iraq. We also propose a deep learning framework for HTR, which combines ResNet-inspired convolutional encoders and a Vision Transformer model. The model is particularly suited for addressing variability, dependencies, and ligatures, which are inherent in Sorani script. The model was trained using several techniques, including span-based masking, geometric and photometric transformations, and depth regularization of the transformer using DropPath and LayerScale. The proposed model achieved a CER of 3.47%, and a WER of 17.37%, comparable to or even better than other state-of-the-art models for Arabic, Persian, and English HTR. In this research, we have established the first benchmark for Kurdish cursive handwriting. We have also demonstrated that well-regularized transformer-based models are capable of effectively recognizing complex, low-resource cursive scripts. The results of this research are expected to pave the way for future research in multilingual OCR, writer adaptation, synthetic data, and inclusive AI.

Article activity feed