MNI-GAIR: Multi-scale Normal Image and Grid Attention-based Image Recognition

Maoyang Xu
Zhuqing Zheng
Borun He
Yinfeng Chen
Jinye Wang
Chen Han
Gengyifan Shang
Lihe Chen
Wancheng Zhao
YuFei Zhou
Changjiang Zhang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

@xumaoyang's saved articles (xumaoyang)

Abstract

Achieving high-precision facial recognition in complex scenarios is one of the key challenges in the field of computer vision. This paper proposes a multi-modal collaborative recognition framework, MNI-GAIR, which combines multi-scale normal image generation, dynamic grid attention mechanisms, and point cloud generalization techniques to address the low efficiency of cross-modal feature alignment and the insufficient monocular real-time performance in existing methods. These innovations significantly enhance recognition performance in complex scenarios such as occlusion and extreme poses. Firstly, a multi-scale normal map generation module based on differentiable rendering is designed, which combines GLCM-LBP features and Cascaded Atrous Pyramid (CAP), improving noise robustness by 23.6% bib1. Secondly, a dynamic grid partitioning attention network (DGPA-Net) is proposed, which optimizes grid structures through gradient-driven approaches and incorporates dual-path attention mechanisms, improving recognition accuracy for extreme side-face (\(>75°\)) scenarios by 14.7% bib2. Lastly, a point cloud generalization framework based on Lie group theory is introduced, enabling cross-modal feature fusion and reduces cross-pose error rates (EER) to 1.23% bib3. Experimental results on multiple standard datasets, including FaceScape and LFW, demonstrate that MNI-GAIR outperforms existing methods in terms of accuracy, robustness, and computational efficiency, providing a systematic solution for 3D facial analysis. The source code is available on GitHub at \href{https://github.com/LLxuLL/MNI-GAIR-Multi-scale-Normal-Image-and-Grid-Attention-based-Image-Recognition}{LLxuLL/MNI-GAIR-Multi-scale-Normal-Image-and-Grid-Attention-based-Image-Recognition: Multi-scale Normal Image and Grid Attention-based Image Recognition}.

Version published to 10.21203/rs.3.rs-6821913/v1 on Research Square
Nov 19, 2025

<p style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;">AttnLink: Enhancing Cross-Modal Fusion for Robust Image-to-PointCloud Place Recognition

This article has 2 authors:
1. Ziyu Fang
2. Minghao Ye
This article has no evaluationsLatest version Jan 14, 2026
Adaptive Feature Alignment and Enhancement for Precise Fine-Grained Visual Recognition

This article has 3 authors:
1. Qianhao Zhao
2. Jianlei Liu
3. Ke Zhang
This article has no evaluationsLatest version Feb 23, 2026
Vectorial Total Symmetric Variation and Applications to Color Image Decomposition

This article has 3 authors:
1. Roy Y. He
2. Martin Huska
3. Hao Liu
This article has no evaluationsLatest version Feb 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

<p style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;">AttnLink: Enhancing Cross-Modal Fusion for Robust Image-to-PointCloud Place Recognition

Adaptive Feature Alignment and Enhancement for Precise Fine-Grained Visual Recognition

Vectorial Total Symmetric Variation and Applications to Color Image Decomposition