FasterMLP: Multilayer Perceptron-based Attention Mechanism and Wavelet Sampling Fusion Networks

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The integration of Multi-layer Perceptrons (MLPs) with Convolutional Neural Networks (CNNs) and attention mechanisms has recently demonstrated promising enhancements in model performance across various domains. This paper presents a novel deep neural network named FasterMLP, which employs MLPs, CBAM (Convolutional Block Attention Module) attention, and Haar wavelet downsampling to achieve high efficiency and accuracy. The architecture comprises four stages, each equipped with CBAM-enhanced convolutional networks, interconnected by Haar wavelet downsampling modules for optimal feature representation and reduced spatial dimensions. A rigorous evaluation of the FasterMLP model was conducted on a suite of benchmark tasks, including image classification on the ImageNet-1K dataset, object detection on the COCO dataset, and instance segmentation on the Cityscapes dataset. The evaluation demonstrates that FasterMLP significantly outperforms many lightweight models in terms of speed and accuracy. On the ImageNet-1K dataset, FasterMLP-S achieves a top-1 accuracy higher 3.9\% than that of MobileViT-XXS, while being 2x and 2.7x faster on GPU and CPU platforms respectively. On COCO, the model parameters of FasterMLP-L are close to thoseof FasterNet-S, but the performance is comparable to that of FasterNet-M. On Cityscape, The mIoU get reach 81.7\% and it is higher than CCNet, DANet and etc. FasterMLP demonstrates robust performance in object detection and instance segmentation, thereby corroborating its efficacy in real-world applications. These findings demonstrate the potential of combining traditional convolutional neural network (CNN) architectures with multi-layer perceptrons and attention mechanisms to enhance the computational efficiency and accuracy for tasks in visual perception, particularly in resource-constrained environments.

Article activity feed