Toward Non-Invasive Voice Restoration: A Deep Learning Approach Using Real-Time MRI

Mohamad Saleh

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Despite recent advances in brain–computer interfaces (BCIs) for speech restoration, existing systems remain invasive, costly, and inaccessible to individuals with congenital mutism or neurodegenerative disease. We present a proof-of-concept pipeline that synthesizes personalized speech directly from real-time magnetic resonance imaging (rtMRI) of the vocal tract, without requiring acoustic input. Segmented rtMRI frames are mapped to articulatory class representations using a Pix2Pix conditional GAN, which are then transformed into synthetic audio waveforms by a convolutional neural network modeling the articulatory-to-acoustic relationship. The outputs are rendered into audible form and evaluated with speaker-similarity metrics derived from Resemblyzer embeddings. While preliminary, our results suggest that even silent articulatory motion encodes sufficient information to approximate a speaker’s vocal characteristics, offering a non-invasive direction for future speech restoration in individuals who have lost or never developed voice.

Version published to 10.1101/2025.08.22.25334256 on medRxiv
Aug 26, 2025

High-Fidelity Neural Speech Reconstruction through an Efficient Acoustic-Linguistic Dual-Pathway Framework

This article has 5 authors:
1. Jiawei Li
2. Chunxu Guo
3. Chao Zhang
4. Edward F. Chang
5. Yuanning Li
This article has no evaluationsLatest version Sep 25, 2025
RAE-NeRF: Residual-Based Audio-Video Encoder with Denoising in Talking Head Synchronization

This article has 6 authors:
1. Wengang Pang
2. Xiang Li
3. Taotao Tang
4. Weihua Wu
5. Xinyu Chang
6. Lin Zhang
This article has no evaluationsLatest version Sep 26, 2025
TriNet-MTL: A Multi-Branch Deep Learning Framework for Biometric Identification and Cognitive State Inference from Auditory-Evoked EEG

This article has 2 authors:
1. Noor Fatima
2. Ghulam Nabi
This article has no evaluationsLatest version Aug 18, 2025

Listed in

Abstract

Article activity feed

Related articles

High-Fidelity Neural Speech Reconstruction through an Efficient Acoustic-Linguistic Dual-Pathway Framework

RAE-NeRF: Residual-Based Audio-Video Encoder with Denoising in Talking Head Synchronization

TriNet-MTL: A Multi-Branch Deep Learning Framework for Biometric Identification and Cognitive State Inference from Auditory-Evoked EEG