ViT-ConvGDNet: A Vision Transformer–MobileNet Guided Decoder Network for Robust Copy-Move Forgery Detection and Localization

Bhagvan Krishna Gupta
Abhishek Kumar Chouhan
Durgesh Singh
Sandeep S. Udmale
Ankur Pandey

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Copy-move forgery Digital image manipulation is a common form of copy-move forgery where a portion of an image is copied and pasted back into the image. This is especially difficult to detect when the forgery has been done on copied areas that have undergone post-processing functions, e.g. rotation, scaling or blurring. We suggest a new encoder-decoder framework named ViT-ConvGDNet, which integrates the global contextual strengths of Vision Transformers with feature extraction strengths of convolutional operations in MobileNet. Sobel edge detection is added to the encoder to improve the level of awareness of the boundary and sharpness of features. Also, there is Atrous Spatial Pyramid Pooling (ASPP) to obtain multi-scale contextual data that are necessary to accurately perform localization. A layer-wise weighted loss mechanism controls the decoding process, which uses a custom mixture of loss functions to every decoder layer to improve the prediction accuracy. ViT-ConvGDNet makes use of patch-based self-attention mechanisms and is effective at learning long-range dependencies and being trained to the different scales and complexities of images. The performance of the model is better as shown by extensive evaluations on several benchmark datasets such as MICC-F600, MICC-F2000, IMD, Coverage, CoMoFoD, Ardizzone, GRIP, and CASIA. It has been experimentally demonstrated that ViT-ConvGDNet is more effective than various current deep learning methods and provides a rigorous and scalable solution to problematic copy-move forgery detection and localization.

Version published to 10.21203/rs.3.rs-9094741/v1 on Research Square
Apr 9, 2026

A Modified Vision Transformer for Kurdish Cursive RTL Handwritten Text Recognition

This article has 2 authors:
1. Faraedwn M. Salih
2. Abdulbasit K. Al-talabani
This article has no evaluationsLatest version Apr 6, 2026
GISNet: Lightweight image-to-image steganography based on improved emd and dual-domain graph convolutional network

This article has 4 authors:
1. Xintao Duan
2. Rusheng Chen
3. Sen Li
4. Chuan Qin
This article has no evaluationsLatest version Apr 14, 2026
Classification of deepfake images with RANSAC for feature extraction and a hybrid model of YOLOv5 and ResNet-50

This article has 3 authors:
1. Rohan Singh
2. Dilip Kumar Sharma
3. Praphula Kumar Jain
This article has no evaluationsLatest version Apr 7, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Modified Vision Transformer for Kurdish Cursive RTL Handwritten Text Recognition

GISNet: Lightweight image-to-image steganography based on improved emd and dual-domain graph convolutional network

Classification of deepfake images with RANSAC for feature extraction and a hybrid model of YOLOv5 and ResNet-50