Acoustic-Driven Generation of Pathological Speech Reports Using Large Language Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Clinical reports compile patients' histories, treatments, and outcomes, enabling the creation of personalized and effective treatment plans. However, speech disorders are rarely analyzed using such reports, primarily due to the absence of standardized speech protocols. Nevertheless, speech and language therapists (SLTs) can rely on perceptual evaluations, such as the modified Frenchay Dysarthria Assessment (mFDA) scale, to quantify the severity of symptoms across seven categories: breathing, lips, larynx, palate, monotonicity, tongue, and intelligibility. In this paper, we propose using Large Language Models (LLMs) to generate FDA-like text reports from audio recordings. Furthermore, we improve \textit{user control} over the input to the LLM by extracting acoustic biomarkers (correlated with the categories from the mFDA) and using them as prompts to the language model. For this, we used speech recordings from 50 Parkinson's disease (PD) patients and 50 healthy controls (HC), whose audio recordings were assessed by three SLTs according to the mFDA.Structured reports are generated by feeding acoustic biomarkers that are extracted from the speech signals. For this, we only use acoustic biomarkers that are correlated to the seven categories of the mFDA.The results demonstrate that the LLMs can generate reports with a BLEU score of 0.789 for PD and 0.836 for HC, showing the potential of our proposed approach for practical medical applications.