Multi Stage Spatial Temporal Ensemble Model with Integrated Learning Methods for Robust Deepfake Detection

Warusia Yassin
Faizal Abdollah
Anuar Ismail
Noor Hisham Kamis
Siti Fatimah Abdul Razak
Helen K Joy

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In the era of synthetic media, robust and scalable deepfake detection has become critical to preserving digital content integrity. Existing detection methods often focus narrowly on spatial or temporal features, limiting generalizability and robustness. This paper proposes an Integrated Learning Methods (ILM) Model, a novel multi-stage hybrid architecture combining YOLOv5 for precise face detection, Haar Cascade for face validation, ResNet-50 for hierarchical spatial feature extraction, LightGBM for frame-level classification, LSTM for temporal modeling, and Random Forest for final ensemble fusion. Evaluated on FaceForensics + + and Celeb-DF (v2) datasets, the proposed ILM achieved 98% accuracy, precision, recall, and F1-score, outperforming state-of-the-art CNN, RNN, and transformer-based models. Ablation studies validated the incremental contributions of each module, confirming the synergistic design of ILM in addressing spatial misalignment, temporal inconsistencies, and generalization limitations. The modular and scalable design supports deployment in digital forensics, media authentication, and AI governance, while future work will integrate transformer-based global context encoders and explainable AI for enhanced robustness and interpretability.

Version published to 10.21203/rs.3.rs-7131420/v1 on Research Square
Sep 15, 2025

Enhancing Diabetic Retinopathy Prediction Using Transformer-based Attention in Hybrid CNN Models

This article has 4 authors:
1. Aayush Verma
2. Sanket Agrawal
3. Shreyans Jain
4. S Kanthimathi
This article has no evaluationsLatest version Sep 23, 2025
Real Time Detection of Deepfakes Using the Efficient Swin Attention Network with Global and Local Facial Features

This article has 6 authors:
1. Muhammad Javed Bhutto
2. Dezhi Han
3. Fida Hussain Dahri
4. Teerath Kumar
5. Jameel Ahmed Bhutto
6. Ashfaque Khowaja
This article has no evaluationsLatest version Oct 20, 2025
A Multi-Scale Feature Fusion Dual-Branch Mamba-CNN Network for Landslide Extraction

This article has 3 authors:
1. Zhiheng Yang
2. Hua Zhang
3. Nanshan Zheng
This article has no evaluationsLatest version Sep 2, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Enhancing Diabetic Retinopathy Prediction Using Transformer-based Attention in Hybrid CNN Models

Real Time Detection of Deepfakes Using the Efficient Swin Attention Network with Global and Local Facial Features

A Multi-Scale Feature Fusion Dual-Branch Mamba-CNN Network for Landslide Extraction