ST-WID: Self-Supervised Transformer for Writer Identification in Arabic Handwritten Scripts
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Despite significant advancements in handwriting analysis, writer identification in Arabic manuscripts remains challenging due to the scarcity of large-scale annotated datasets. Traditional methods, which primarily rely on supervised learning and convolutional architectures, often struggle in such low-resource conditions. In contrast, unlabeled Arabic handwritten data is often available in large quantities. To address this gap, we propose in this paper an end-to-end Vision Transformer (ViT)-based framework for Arabic writer identification. Our approach leverages self-supervised learning during pretraining to acquire robust and transferable feature representations from unlabeled handwritten words, followed by supervised fine-tuning phase for writer identification task. Additionally, we integrate a synthetic data generation strategy to further mitigate data scarcity issues. The proposed framework achieves state-of-the-art performance on two benchmark Arabic datasets: the IFN/ENIT and the AHTID/MW datasets.