HADOCR: A Hierarchical Attention-Driven Framework with Dynamic Sampling for Ancient Chinese Medical Literature Recognition

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The digitization of ancient Chinese medical literature faces significant challenges due to its complex typefaces, irregular layouts, degraded documents, and varied writing styles across dynasties. This research addresses the limitations of existing scene text recognition (STR) methods, which perform poorly on historical texts, especially medical manuscripts. The primary research question is how to design a robust and effective method for ancient Chinese medical text recognition.We propose HADOCR, a novel end-to-end framework for ancient Chinese medical text recognition. The model leverages a Vision Transformer (ViT) for visual feature extraction, followed by a Feature Fusion Block (FFB) to mix local and global features. Two innovative modules are introduced: Dynamic Ratio Sampling (DRS), which adapts to multi-scale sampling while preserving aspect ratios and structural features, and Dual-Attention Feature Rearrangement (DAFR), which applies hierarchical attention to improve handling of character deformation and irregular text arrangements.We propose HADOCR, a novel end-to-end framework for ancient Chinese medical text recognition. The model leverages a Vision Transformer (ViT) for visual feature extraction, followed by a Feature Fusion Block (FFB) to mix local and global features. Two innovative modules are introduced: Dynamic Ratio Sampling (DRS), which adapts to multi-scale sampling while preserving aspect ratios and structural features, and Dual-Attention Feature Rearrangement (DAFR), which applies hierarchical attention to improve handling of character deformation and irregular text arrangements.HADOCR demonstrates superior performance in ancient Chinese medical text recognition, significantly improving feature preservation and handling spatial deformations. The introduction of the ACML dataset, containing over 100,000 instances, provides a valuable resource for future research in historical document recognition. Our work paves the way for better digital preservation and knowledge mining of traditional Chinese medical literature and has broader applications in historical document recognition.

Article activity feed