Lip Reading with Deep Learning: A Comprehensive Analysis of Model Architectures
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Lip reading, a pivotal skill in augmenting communication for the hearing impaired, has seen significant advancements with deep learning techniques. This study presents a comprehensive analysis of various deep learning model ar-chitectures for lip reading using a newly constructed dataset, DATAV1. Our investigation explores and evaluates multiple architectures, including ResBlock3D, Conv3D, Conv2D, TimeDis-tributed, attention mechanism and LSTM. Through extensive experimentation and rigorous evaluation metrics, we identify and discuss one of the optimal architectures for accurate lip reading performance, achieving a peak validation accuracy of 98.18%. This research contributes insights into effective model selection and lays groundwork for further advancements in enhancing human-machine communication through lip reading systems.