Enhanced Action Recognition through Deep Spatiotemporal Learning Using 3D CNN and GRU

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The issues revolve around efficiently analyzing large video data streams while minimizing computer complexity and performing processing in real-time. On the other hand, it becomes more difficult to quickly react to unusual actions because of this. Also, smart homes, security systems, assisted living facilities, and health monitoring might all benefit from the ability to recognize events from video sequences. The techniques used to analyse data are still under constant scrutiny, even if sensing technology has advanced, especially with respect to 3D video. By combining 3D Convolutional Neural Networks (CNN) with gated recurrent units (GRU), we have created a new method for learning spatiotemporal features in movies. We found that 3D convolutional neural networks (CNNs) acquire spatiotemporal information better than 2D CNNs using the UCF50 dataset. Using smaller 3x3x3 convolution kernels in a uniform design also improves performance. Furthermore, we found that 3D CNN with GRU integrated yields better accuracy than 3D CNN alone. The results show that GRU outperforms LSTM in terms of accuracy (89.89%) and calculation time (less than LSTM) when compared.

Article activity feed