Enhancing Vision Transformer with Multiple Fractional-Order Differential Operators for Image Desnowing
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Image desnowing aims to eliminate the complex visual degradation caused by snowflake noise and is an important branch of image restoration. In this paper, we consider the self-similar complex edges and rough translucent structures of snowflake noise, which can be characterized by fractal dimension. We use multiple fractional-order differential operators to model fractals, thereby enhancing the Vision Transformer (ViT), and propose MF-ViT. MF-ViT is a dedicated deep learning desnowing model based on the specific prior modeling of the fractal features of snowflake noise. Specifically, to enhance fractal feature representation ability, we incorporate fractional differential operators of different orders into the attention and feedforward networks of ViT, which help to handle fractal features. We empirically evaluate the proposed MF-ViT on five benchmark public desnowing datasets. The results show that MF-ViT achieves state-of-the-art performance in both simulation and real-world images with snowflake noise. This paper also provides new model improvement ideas for other machine vision pattern analysis tasks with fractal dimension features. Accepted at MMM 2026 (International Conference on Multimedia Modeling), to appear in Springer LNCS. This is the author-created version of the manuscript.