Understanding Transformer-Based OCR for Medieval Manuscripts: A Systematic Ablation Study and Inspection Analysis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Adapting Transformer-based Optical Character Recognition (TrOCR) models to medieval manuscripts presents a significant domain gap. This work provides a systematic investigation into TrOCR fine-tuning strategies using a 14th-15th century Italian manuscript. We conduct controlled ablation studies on preprocessing, data augmentation, and encoder layer freezing. Results demonstrate that full fine-tuning of all encoder layers is critical, achieving an 11.68% Character Error Rate (CER). We also show that ContrastLimited Adaptive Histogram Equalization(CLAHE) preprocessing yields a 12.9% relative CER reduction. Our hyperparameter configuration generalized effectively, achieving 7.68% CER on the public READ-16 benchmark. As a key contribution, we perform a quantitative analysis of model localization maps. We establish that encoder-based Grad-CAM entropy and Gini impurity are a much stronger correlates of token prediction loss than decoder cross-attention. We propose its utility as a robust diagnostic for visual uncertainty. This finding has direct applications for uncertainty sampling in active learning and pseudo-label filtering in semi-supervised learning workflows. This study offers both practical guidelines for adapting TrOCR and a novel method for interpreting model adaptation.