Enhancing Handwritten Mathematical Expression Recognition with Hybrid Encoding and Disentangled Attention Mechanisms
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Handwritten Mathematical Expression Recognition (HMER) is a crucial technology for converting handwritten formulas into machine-readable formats, with wide applications in digital education and scholarly communication. While the dominant CNN-Transformer architecture has shown promise, it suffers from two fundamental bottlenecks: the inefficiency of CNNs in modeling long-range dependencies due to their local receptive fields, and signal conflicts in coverage mechanisms caused by heterogeneous attention. To overcome these dual challenges, this paper introduces DAT-Former, a novel architecture featuring two synergistic innovations: a globally-aware hybrid encoder with a task-adaptive 2D Rotary Position Embedding (2D-RoPE) to explicitly capture spatial topology, and an Adaptive Gated Coverage Module (AGCM) that uses a data-driven gate to resolve attention conflicts. Extensive experiments demonstrate state-of-the-art recognition rates of 63.86%, 60.51%, and 64.89% on CROHME 2014/16/19, respectively, and exceptional generalization on HME100K. This work highlights that a carefully designed sequence-to-sequence paradigm can rival more complex tree-based approaches, setting a new benchmark for robust HMER. The source code is available at https://github.com/jichaoqun/DAT-former.