Oracle Character Recognition Using Universal Inverted Bottleneck and Inverse Image Frequency

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The oracle bone script, one of the most well-known ancient writing systems, plays a key role in the study of ancient Chinese characters. To speed up the digitization of oracle bone documents through automatic recognition, we propose a recognition model called QROB (Quick Response Oracle Bone). Due to the unique characteristics of oracle bone characters, only a small portion has been translated, leading to an imbalance in the sample distribution across different characters. To address this issue, we use the Inverse Image Frequency De-biasing method and incorporate the UIB (Universal Inverted Bottleneck) module into a lightweight model structure to improve training performance. Additionally, the limited number of translated oracle bone samples often results in sparse data in character datasets. To overcome this, we apply the FFD (Free-Form Deformation) method for data augmentation. We also introduce a new dataset, OBC-V, which better integrates oracle bone characters and words compared to existing datasets. Experimental results on three different datasets (OBC-V, HWOBV, and OBC306) demonstrate the effectiveness of our approach. This study advances oracle bone character recognition and contributes to a more efficient and accurate interpretation of ancient scripts. The code has been published at https://github.com/alphazzv/DnUse

Article activity feed