Facial Movements Extracted from Video for the Kinematic Classification of Speech

Richard Palmer
Roslyn Ward
Petra Helmholz
Geoffrey R. Strauss
Paul Davey
Neville Hennessey
Linda Orton
Aravind Namasivayam

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Speech Sound Disorders (SSDs) are prevalent communication problems in children that pose significant barriers to academic success and social participation. Accurate diagnosis is key to mitigating life-long impacts. We are developing a novel software solution—the Speech Movement and Acoustic Analysis Tracking (SMAAT) system to facilitate rapid and objective assessment of motor speech control issues underlying SSD. This study evaluates the feasibility of using automatically extracted three-dimensional (3D) facial measurements from single two-dimensional (2D) front-facing video cameras for classifying speech movements. Videos were recorded of 51 adults and 77 children between 3 and 4 years of age (all typically developed for age) saying 20 words from the mandibular and labial-facial levels of the Motor-Speech Hierarchy Probe Wordlist (MSH-PW). Measurements around the jaw and lips were automatically extracted from the 2D video frames using a state-of-the-art facial mesh detection and tracking algorithm, and each individual measurement was tested in a Leave-One-Out Cross-Validation (LOOCV) framework for its word classification performance. Statistics were evaluated at the α=0.05 significance level and several measurements were found to exhibit significant classification performance in both the adult and child cohorts. Importantly, measurements of depth indirectly inferred from the 2D video frames were among those found to be significant. The significant measurements were shown to match expectations of facial movements across the 20 words, demonstrating their potential applicability in supporting clinical evaluations of speech production.

Version published to 10.3390/s24227235
Nov 12, 2024
Version published to 10.20944/preprints202410.1591.v1
Oct 21, 2024

Decoding motor imagery related to major mimetic muscles from electroencephalography

This article has 7 authors:
1. Haoran Sun
2. Mengkun Ding
3. Xiaofeng Shan
4. Shang Xie
5. Dongming Chang
6. Nianming Zuo
7. Zhigang Cai
This article has no evaluationsLatest version Nov 10, 2025
Comparative Analysis of Artificial Intelligence-Based Lateral Facial Analysis with Manual and Radiographic Measurements

This article has 5 authors:
1. Monisha P Chelery
2. Crystal Runa Soans
3. Ravi M Subrahmanya
4. Ashritha P S
5. Shivangi Satnaliwala
This article has no evaluationsLatest version Oct 27, 2025
High-Density Electromyography of Swallowing and Phonation: Methods for Automated Analysis

This article has 7 authors:
1. Kiara J. W. Miller
2. Recep Avci
3. Gregory B. Sands
4. Jaime Lara
5. Phoebe Macrae
6. Maggie-Lee Huckabee
7. Leo K. Cheng
This article has no evaluationsLatest version Nov 4, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Decoding motor imagery related to major mimetic muscles from electroencephalography

Comparative Analysis of Artificial Intelligence-Based Lateral Facial Analysis with Manual and Radiographic Measurements

High-Density Electromyography of Swallowing and Phonation: Methods for Automated Analysis