Associative Modeling of Chinese Character Stroke Sequences Combining Transformer and Geometric Constraints
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This research introduces a combined deep learning framework for classifying Chinese stroke sequences using the tools of multi-scale spatial encoding, hierarchical attention modeling, and geometric constraint learning. Each stroke sample is encoded to a 2D trajectory-density feature map to retain spatial continuity, temporal progression, and fine-grained geometric structure. The U-Net framework enables the extraction of multi-resolution spatial features, while the Swin Transformer captures long-range contextual dependencies with shifted-window self-attention. A geometric constraint loss term is included to capture curvature smoothness, directionality consistency, and structural fidelity to address the challenges of visual similarity and the significant variability of handwritten strokes. Experiments conducted with four main stroke classes-heng, shu, pie, and na demonstrated strong classification performance at 98.6% accuracy, average precision of over 0.994, and AUC of over 0.995. Evaluations with the confusion matrix, ROC curves, precision–recall curves, and error rate (FPR/FNR) metrics each establish the robustness and generalizability of the model across different handwriting styles. These findings indicate that the framework effectively captures stroke-level spatial–temporal patterns and provides a robust basis for downstream applications, including character reconstruction, handwriting analysis, digital calligraphy, and intelligent writing-education systems.