Enhancing Handwritten Mathematical Expression Recognition with Hybrid Encoding and Disentangled Attention Mechanisms

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Handwritten Mathematical Expression Recognition (HMER) is a crucial technology for converting handwritten formulas into machine-readable formats, with wide applications in digital education and scholarly communication. While the dominant CNN-Transformer architecture has shown promise, it suffers from two fundamental bottlenecks: the inefficiency of CNNs in modeling long-range dependencies due to their local receptive fields, and signal conflicts in coverage mechanisms caused by heterogeneous attention. To overcome these dual challenges, this paper introduces DAT-Former, a novel architecture featuring two synergistic innovations: a globally-aware hybrid encoder with a task-adaptive 2D Rotary Position Embedding (2D-RoPE) to explicitly capture spatial topology, and an Adaptive Gated Coverage Module (AGCM) that uses a data-driven gate to resolve attention conflicts. Extensive experiments demonstrate state-of-the-art recognition rates of 63.86%, 60.51%, and 64.89% on CROHME 2014/16/19, respectively, and exceptional generalization on HME100K. This work highlights that a carefully designed sequence-to-sequence paradigm can rival more complex tree-based approaches, setting a new benchmark for robust HMER. The source code is available at https://github.com/jichaoqun/DAT-former.

Article activity feed