Deep Learning-Based Framework for Filtering Objectionable Scenes in Cartoon Videos

Irshad Ullah
Sameed ur Rehman
Wajahat Akbar
Altaf Hussain
Raaz Waheeb Attar
Ruzat Ullah
Tariq Hussain
Amal Hassan Alhazmi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In today’s era, technology has become an essential part of life and affects in both ways positive and negative. The advancements in technology have made it possible for anyone to access almost all types of contents and video clips, both online and offline, without any substantial restrictions. Cartoons videos, who’s primary focus is children are popular in all ages have also evolved from manual and advanced visual effects. On one hand, cartoon quality and depth of detail improved on the other hand, objectionable content improved. e.g. violence and nudity in cartoon content, can have a long-term harmful impact on a person’s psychology and mental health. As most of the parental filters are designed around the ‘genre’ and the “Cartoon” itself becomes the token of permission, children are more prone to its bad effects. This research focuses on the detection and prediction of nudity and violence in cartoon videos that can lead to an effective filter mechanism. Two major Deep Neural Network models, RestNet50 and VGG-16 are used in this research which are trained using a custom-made dataset. Because of the unavailability of any standard dataset, a special dataset is developed by collecting cartoon videos in three categories: Nude, Violent, and Normal. A total of 340 video clips from 25 different series are collected to classify the objectionable contents in cartoon videos in three categories i.e. Nude, violent and normal. From these video clips a total of 44,430 frames were selected. These frames are used to train and test the proposed model. A 117 layered neural network is designed that are grouped into 11 broad types. At the input layer (100 x 100 x 3). Filter size at the convolutional layer is (1,1) with 256 layers dimension of the activation layer is (50 x 50 x 64). The size of the output layer is 3, i.e., Nude, Violent and Normal. This model achieves an impressive accuracy of 97.25%, which outperformed as compared to other state-of-the-art existing technologies. Additionally, the proposed work is also compared with VGG-16, which achieved 92.31% accuracy.

Version published to 10.21203/rs.3.rs-9101753/v1 on Research Square
Apr 16, 2026

Classification of deepfake images with RANSAC for feature extraction and a hybrid model of YOLOv5 and ResNet-50

This article has 3 authors:
1. Rohan Singh
2. Dilip Kumar Sharma
3. Praphula Kumar Jain
This article has no evaluationsLatest version Apr 7, 2026
CascadeNet: A Two-Stage Hybrid Learning Framework for Explainable Deepfake Forensics

This article has 4 authors:
1. Jatin Yadav
2. Divyanshu Sharma
3. Pritee Khanna
4. Neha Gour
This article has no evaluationsLatest version Apr 13, 2026
Automated Yoga Pose Classification Using Deep Learning on Image-Based Datasets

This article has 8 authors:
1. Anish Antony
2. M.A.H. Farquad
3. Ashvini Alashetty
4. Sachin Kumar
5. Punitkumar Basavaraj Nayak
6. Geethanjali P P
7. sachin sharma
8. ( Kamanuri Sekhar) Sekhar K. Sekhar
This article has no evaluationsLatest version Apr 14, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Classification of deepfake images with RANSAC for feature extraction and a hybrid model of YOLOv5 and ResNet-50

CascadeNet: A Two-Stage Hybrid Learning Framework for Explainable Deepfake Forensics

Automated Yoga Pose Classification Using Deep Learning on Image-Based Datasets