Pertsch: A Corpus of Persian and German Based on Different Speech Elicitation Tasks

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper introduces the Pertsch Corpus, a task-based speech corpus designed to in-vestigate speech production across different communicative situations in Persian andGerman. The corpus consists of recordings from sixty speakers who completed a fixedsequence of seven elicitation tasks ranging from controlled read speech to more open-ended communicative conditions, including conversation, storytelling, picture descrip-tion, and voice-message tasks.All recordings were collected under standardized laboratory conditions and areaccompanied by orthographic transcriptions, phonetic segmentation, and multi-layerPraat TextGrid annotations. In addition to orthographic and phonetic information, thecorpus includes annotation layers for verbal and non-verbal elements such as pauses,fillers, and repairs.The within-speaker multi-task design enables systematic comparison of speech be-havior across communicative contexts, while the parallel structure across Persian andGerman supports cross-linguistic investigation under comparable elicitation conditions.The paper documents the design rationale, recording procedure, transcription work-flow, and annotation structure of the corpus, and presents an initial exploratory analy-sis illustrating how the dataset can be used to investigate variability in speech produc-tion across tasks, speakers, and languages through the temporal distribution of verbaland non-verbal elements.

Article activity feed