Real-Time Lip Reading and Speech Synthesis Using CTC-CNN-BiLSTM Networks with Flask Deployment

Ahmed

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper presents a novel approach to real-time lip reading to speech using a combination of Connectionist Temporal Classification (CTC), Convolutional Neural Networks (CNN), mediapipe and Bidirectional Long Short-Term Memory (Bi-LSTM) networks, followed by a deployment strategy using Flask. The proposed system aims to transcribe spoken language from silent video sequences by leveraging the spatial and temporal features of lip movements. Extensive experiments on the GRID dataset demonstrate the effectiveness of the model, achieving 8.15% as Character Error Rate (CER) and 91.85% as Character Accuracy (CA). Additionally, we outline the deployment process, which enables real-time lip reading through a web application.

Version published to 10.31219/osf.io/ncdtp_v1 on OSF Preprints
Apr 11, 2025

Lip Reading with Deep Learning: A Comprehensive Analysis of Model Architectures

This article has 1 author:
1. Ahmed
This article has no evaluationsLatest version Apr 11, 2025
Real-Time Amharic Hate Speech Detection in Live Streams and Video Chats

This article has 1 author:
1. Baye Atnafu Ferede
This article has no evaluationsLatest version Apr 14, 2025
Lip Reading with Deep Learning: A Comprehensive Analysis of Model Architectures

This article has 1 author:
1. Ahmed cherif
This article has no evaluationsLatest version Apr 23, 2025

Listed in

Abstract

Article activity feed

Related articles

Lip Reading with Deep Learning: A Comprehensive Analysis of Model Architectures

Real-Time Amharic Hate Speech Detection in Live Streams and Video Chats

Lip Reading with Deep Learning: A Comprehensive Analysis of Model Architectures