A Modified Vision Transformer for Kurdish Cursive RTL Handwritten Text Recognition

Faraedwn M. Salih
Abdulbasit K. Al-talabani

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The problem of handwritten text recognition (HTR) of low-resource cursive scripts is a major problem in document image analysis. This research aims to bridge the gap of unavailable annotated data and effective text recognition systems for Central Kurdish, also known as Sorani, which is a complex script consisting of 34 letters, rich ligatures, and context-dependent diacritical marks, and it is the first time this script is studied in HTR research like our approach. In this research, we present DASNUS, a new large-scale dataset of 11,475 annotated text lines from 867 writers across the Kurdistan region of Iraq. We also propose a deep learning framework for HTR, which combines ResNet-inspired convolutional encoders and a Vision Transformer model. The model is particularly suited for addressing variability, dependencies, and ligatures, which are inherent in Sorani script. The model was trained using several techniques, including span-based masking, geometric and photometric transformations, and depth regularization of the transformer using DropPath and LayerScale. The proposed model achieved a CER of 3.47%, and a WER of 17.37%, comparable to or even better than other state-of-the-art models for Arabic, Persian, and English HTR. In this research, we have established the first benchmark for Kurdish cursive handwriting. We have also demonstrated that well-regularized transformer-based models are capable of effectively recognizing complex, low-resource cursive scripts. The results of this research are expected to pave the way for future research in multilingual OCR, writer adaptation, synthetic data, and inclusive AI.

Version published to 10.21203/rs.3.rs-9225627/v1 on Research Square
Apr 6, 2026

Error-corrected deep learning approach to handwritten text recognition of Gregg shorthand

This article has 1 author:
1. Alexander Weimer
This article has no evaluationsLatest version Apr 8, 2026
CascadeNet: A Two-Stage Hybrid Learning Framework for Explainable Deepfake Forensics

This article has 4 authors:
1. Jatin Yadav
2. Divyanshu Sharma
3. Pritee Khanna
4. Neha Gour
This article has no evaluationsLatest version Apr 13, 2026
Classification of deepfake images with RANSAC for feature extraction and a hybrid model of YOLOv5 and ResNet-50

This article has 3 authors:
1. Rohan Singh
2. Dilip Kumar Sharma
3. Praphula Kumar Jain
This article has no evaluationsLatest version Apr 7, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Error-corrected deep learning approach to handwritten text recognition of Gregg shorthand

CascadeNet: A Two-Stage Hybrid Learning Framework for Explainable Deepfake Forensics

Classification of deepfake images with RANSAC for feature extraction and a hybrid model of YOLOv5 and ResNet-50