Pertsch: A Corpus of Persian and German Based on Different Speech Elicitation Tasks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper introduces the Pertsch Corpus, a task-based speech corpus designed to in-vestigate speech production across different communicative situations in Persian andGerman. The corpus consists of recordings from sixty speakers who completed a fixedsequence of seven elicitation tasks ranging from controlled read speech to more open-ended communicative conditions, including conversation, storytelling, picture descrip-tion, and voice-message tasks.All recordings were collected under standardized laboratory conditions and areaccompanied by orthographic transcriptions, phonetic segmentation, and multi-layerPraat TextGrid annotations. In addition to orthographic and phonetic information, thecorpus includes annotation layers for verbal and non-verbal elements such as pauses,fillers, and repairs.The within-speaker multi-task design enables systematic comparison of speech be-havior across communicative contexts, while the parallel structure across Persian andGerman supports cross-linguistic investigation under comparable elicitation conditions.The paper documents the design rationale, recording procedure, transcription work-flow, and annotation structure of the corpus, and presents an initial exploratory analy-sis illustrating how the dataset can be used to investigate variability in speech produc-tion across tasks, speakers, and languages through the temporal distribution of verbaland non-verbal elements.