Contrastive Self-Supervised Learning with Domain-Specific Augmentation for Script Classification of Chinese Ancient Manuscripts

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Classification of Chinese ancient manuscripts is crucial for historical and cultural preservation but faces significant challenges due to complex script variability, material degradation, and limited annotated datasets. This study proposes a novel framework leveraging contrastive self-supervised learning for accurate identification and classification of Chinese ancient manuscript scripts, including Seal, Clerical, Cursive, Regular, and Running scripts. Utilizing publicly accessible datasets, particularly the CASIA Ancient Chinese Handwritten Character Database, the HCL2000 Historical Chinese Literature Dataset, and the HUSAM-SinoCDCS Collection, our method integrates domain-specific augmentations that reflect realistic manuscript conditions such as faded ink, fragmentation, and surface damage. Experimental results demonstrate that the proposed framework consistently outperforms traditional supervised learning baselines, achieving higher accuracy and robustness even under low-resource labeled scenarios. The outcomes of this research contribute to advancing computational methods for ancient document analysis and offer valuable tools for digital humanities efforts focused on preserving Chinese cultural heritage.

Article activity feed