MS-Adapter: Multi-scaled Adapter for Efficient DeepFake Detection

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Existing deepfake detection methods overly rely on low-level forgery patterns, leading to poor performance when encounteringunseen forgery types or low-quality images. Recently, Vision Transformer (ViT) pretrained on large-scale datasets havedemonstrated strong generalization capabilities across various image downstream tasks. However, parameter-efficient fine tuning methods for ViTs have shown limited effectiveness in DeepFake detection, mainly because ViTs rely on high-levelsemantics while struggling to capture fine grained local details. To address this issue, this paper proposes MS-Adapter, amulti-scale adapter network for efficient deepfake detection. By embedding multi-scale adapter modules within the pretrainedViT, MS-Adapter progressively extracts and fuses features from low-level forgery artifacts to high-level semantic forgery patternsacross multiple scales. At the same time, the Temporal Aggregation Transformer receives the frame-level features extractedby the Multi-Scale Adapter and performs temporal modeling on these features to enhance forgery detection performance.Experimental results demonstrate that MS-Adapter achieves superior performance on multiple datasets, including FF++,Celeb-DFv2, and DFDC, while requiring only a small number of trainable parameters.

Article activity feed