Real Time Detection of Deepfakes Using the Efficient Swin Attention Network with Global and Local Facial Features
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The fast advance in deepfake technology poses significant challenges in ensuring authenticity and combating misinformation. Although recent deepfake detection approaches have achieved notable progress, many efforts with robust performance across diverse datasets fail to strike a balance between accuracy and real-time efficiency. This research proposes the Efficient-Swin Attention Network (ESANet), a novel framework that leverages local and global facial features for enhanced real-time deepfake detection. Our framework integrates EfficientNet-B0 for lightweight local feature extraction and the Swin Transformer to capture hierarchical global relationships. Leveraging both deep model strengths enables a comprehensive feature representation through an efficient feature fusion mechanism. We evaluate ESANet on three benchmark datasets: FaceForensics++, CelebV1, and CelebV2. The experimental results demonstrate that our ESANet framework achieves higher accuracy of 96.5%, 95.3%, and 94.8% on FaceForensics++, CelebV1, and CelebV2 datasets. While maintaining inference times as low as possible, as is appropriate in real-time cases. Furthermore, cross-dataset tests demonstrate the robustness and extensibility of our proposed scheme. The new scheme effectively addresses challenges in real-time deepfake detection.