Toward Non-Invasive Voice Restoration: A Deep Learning Approach Using Real-Time MRI
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Despite recent advances in brain–computer interfaces (BCIs) for speech restoration, existing systems remain invasive, costly, and inaccessible to individuals with congenital mutism or neurodegenerative disease. We present a proof-of-concept pipeline that synthesizes personalized speech directly from real-time magnetic resonance imaging (rtMRI) of the vocal tract, without requiring acoustic input. Segmented rtMRI frames are mapped to articulatory class representations using a Pix2Pix conditional GAN, which are then transformed into synthetic audio waveforms by a convolutional neural network modeling the articulatory-to-acoustic relationship. The outputs are rendered into audible form and evaluated with speaker-similarity metrics derived from Resemblyzer embeddings. While preliminary, our results suggest that even silent articulatory motion encodes sufficient information to approximate a speaker’s vocal characteristics, offering a non-invasive direction for future speech restoration in individuals who have lost or never developed voice.