BézierFormer: Affine-Invariant Shape Classification via Control Point Attention
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We address the resolution paradox in modern deep learning, where networks receive far more spatial information than they demonstrably utilize for shape classification. We show that it is possible to train networks directly on spatially sparse and structurally compressed shape representations rather than dense pixel grids. Specifically, we extract vector graphic representations of shapes from raster images, and train on control points of curves, which naturally encode the sparse, localized features (corners, curvature extrema) that both human vision and interpretability studies identify as critical for recognition. To effectively process these sparse geometric primitives, we propose an attention-based architecture, BézierFormer, that processes each parametric curve independently through shared-weight transformations, then synthesizes global shape understanding through tailored attention mechanisms. This combination of sparse vector graphic training data and segment-wise processing with attention-based synthesis achieves computational efficiency while maintaining high discriminative power, demonstrating that classification can be performed with dramatically fewer geometric primitives than pixels in conventional approaches.