Real-Time Lip Reading and Speech Synthesis Using CTC-CNN-BiLSTM Networks with Flask Deployment
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper presents a novel approach to real-time lip reading to speech using a combination of Connectionist Temporal Classification (CTC), Convolutional Neural Networks (CNN), mediapipe and Bidirectional Long Short-Term Memory (Bi-LSTM) networks, followed by a deployment strategy using Flask. The proposed system aims to transcribe spoken language from silent video sequences by leveraging the spatial and temporal features of lip movements. Extensive experiments on the GRID dataset demonstrate the effectiveness of the model, achieving 8.15% as Character Error Rate (CER) and 91.85% as Character Accuracy (CA). Additionally, we outline the deployment process, which enables real-time lip reading through a web application.