Deep Learning-Based Framework for Filtering Objectionable Scenes in Cartoon Videos

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In today’s era, technology has become an essential part of life and affects in both ways positive and negative. The advancements in technology have made it possible for anyone to access almost all types of contents and video clips, both online and offline, without any substantial restrictions. Cartoons videos, who’s primary focus is children are popular in all ages have also evolved from manual and advanced visual effects. On one hand, cartoon quality and depth of detail improved on the other hand, objectionable content improved. e.g. violence and nudity in cartoon content, can have a long-term harmful impact on a person’s psychology and mental health. As most of the parental filters are designed around the ‘genre’ and the “Cartoon” itself becomes the token of permission, children are more prone to its bad effects. This research focuses on the detection and prediction of nudity and violence in cartoon videos that can lead to an effective filter mechanism. Two major Deep Neural Network models, RestNet50 and VGG-16 are used in this research which are trained using a custom-made dataset. Because of the unavailability of any standard dataset, a special dataset is developed by collecting cartoon videos in three categories: Nude, Violent, and Normal. A total of 340 video clips from 25 different series are collected to classify the objectionable contents in cartoon videos in three categories i.e. Nude, violent and normal. From these video clips a total of 44,430 frames were selected. These frames are used to train and test the proposed model. A 117 layered neural network is designed that are grouped into 11 broad types. At the input layer (100 x 100 x 3). Filter size at the convolutional layer is (1,1) with 256 layers dimension of the activation layer is (50 x 50 x 64). The size of the output layer is 3, i.e., Nude, Violent and Normal. This model achieves an impressive accuracy of 97.25%, which outperformed as compared to other state-of-the-art existing technologies. Additionally, the proposed work is also compared with VGG-16, which achieved 92.31% accuracy.

Article activity feed