MNI-GAIR: Multi-scale Normal Image and Grid Attention-based Image Recognition
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Achieving high-precision facial recognition in complex scenarios is one of the key challenges in the field of computer vision. This paper proposes a multi-modal collaborative recognition framework, MNI-GAIR, which combines multi-scale normal image generation, dynamic grid attention mechanisms, and point cloud generalization techniques to address the low efficiency of cross-modal feature alignment and the insufficient monocular real-time performance in existing methods. These innovations significantly enhance recognition performance in complex scenarios such as occlusion and extreme poses. Firstly, a multi-scale normal map generation module based on differentiable rendering is designed, which combines GLCM-LBP features and Cascaded Atrous Pyramid (CAP), improving noise robustness by 23.6% bib1. Secondly, a dynamic grid partitioning attention network (DGPA-Net) is proposed, which optimizes grid structures through gradient-driven approaches and incorporates dual-path attention mechanisms, improving recognition accuracy for extreme side-face (\(>75°\)) scenarios by 14.7% bib2. Lastly, a point cloud generalization framework based on Lie group theory is introduced, enabling cross-modal feature fusion and reduces cross-pose error rates (EER) to 1.23% bib3. Experimental results on multiple standard datasets, including FaceScape and LFW, demonstrate that MNI-GAIR outperforms existing methods in terms of accuracy, robustness, and computational efficiency, providing a systematic solution for 3D facial analysis. The source code is available on GitHub at \href{https://github.com/LLxuLL/MNI-GAIR-Multi-scale-Normal-Image-and-Grid-Attention-based-Image-Recognition}{LLxuLL/MNI-GAIR-Multi-scale-Normal-Image-and-Grid-Attention-based-Image-Recognition: Multi-scale Normal Image and Grid Attention-based Image Recognition}.