Tempo-Spatial-Fusion Network: A Novel Framework for Deepfake Detection through Dynamic Integration of Spatial and Temporal Features
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The rapid development in deepfake technology has led to increasingly sophisticated AI-generated content that threatens the integrity of digital media. To address this challenge, we present Tempo-Spatial-Fusion Network (TSF-3 Net) that systematically integrates spatial and temporal analysis for robust deepfake detection. Unlike previous approaches that focus exclusively on either spatial artifacts or temporal inconsistencies, TSF-Net introduces three key innovations. First, it employs Cross-Modal Attention Fusion mechanism that dynamically integrates complementary features from EfficientNetV2L and XceptionNet. Second, it incorporates Temporal Inconsistency Attention Module that explicitly targets frame-to-frame discontinuities. Third, it uses Artifact-Aware Loss Function that directly penalizes predictions inconsistent with detected manipulation cues. Extensive experiments on DFDC and FaceForensics++ datasets demonstrate the TSF-Net's superior performance (95.36% accuracy, 0.92 F1-score). It significantly outperforms single-model approaches and existing hybrid frameworks. Our theoretical analysis provides new insights into artifact persistence across deepfake generation techniques. Our adaptive computational scaling enables efficient deployment across diverse computational environments. The proposed framework advances the state-of-the-art deepfake detection by bridging the gap between spatial and temporal analysis. It maintains interpretability through novel visualization techniques.