SONIVA: Speech recOgNItion Validation in Aphasia

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Post-stroke aphasia is a major contributor to language impairment and neuro-disability worldwide, making automated assessment a critical research priority. However, the development of clinically validated automatic speech recognition (ASR) systems remains limited by the lack of large, annotated datasets that capture aphasia’s heterogeneous and unpredictable manifestations. We introduce SONIVA (Speech recOgNItion Validation in Aphasia), the largest and most richly annotated database of pathological speech to date, comprising recordings from ≈1,000 stroke survivors (including over 200 longitudinally) and ≈7,000 age-matched controls. Current annotations include 576 patients (mean age: 61.23 ± 13.23 years; 69.81% male) and 104 controls (mean age: 61.05 ± 12.05 years; 34% male), with rich linguistic coding, orthographic and international phonetic alphabet transcriptions. Foundation models finetuned on SONIVA extract linguistic features that correlate with expert transcriptions (Spearman’s r = 0.86 - 0.79; p < 0.0001), while acoustic classifiers achieve 93% stroke classification accuracy. These results position SONIVA as a critical resource that can transform rehabilitation through objective, scalable speech assessment.

Article activity feed