Real Time Detection of Deepfakes Using the Efficient Swin Attention Network with Global and Local Facial Features

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The fast advance in deepfake technology poses significant challenges in ensuring authenticity and combating misinformation. Although recent deepfake detection approaches have achieved notable progress, many efforts with robust performance across diverse datasets fail to strike a balance between accuracy and real-time efficiency. This research proposes the Efficient-Swin Attention Network (ESANet), a novel framework that leverages local and global facial features for enhanced real-time deepfake detection. Our framework integrates EfficientNet-B0 for lightweight local feature extraction and the Swin Transformer to capture hierarchical global relationships. Leveraging both deep model strengths enables a comprehensive feature representation through an efficient feature fusion mechanism. We evaluate ESANet on three benchmark datasets: FaceForensics++, CelebV1, and CelebV2. The experimental results demonstrate that our ESANet framework achieves higher accuracy of 96.5%, 95.3%, and 94.8% on FaceForensics++, CelebV1, and CelebV2 datasets. While maintaining inference times as low as possible, as is appropriate in real-time cases. Furthermore, cross-dataset tests demonstrate the robustness and extensibility of our proposed scheme. The new scheme effectively addresses challenges in real-time deepfake detection.

Article activity feed