Pertsch: A Corpus of Persian and German Based on Different Speech Elicitation Tasks

Neda Mousavi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper introduces the Pertsch Corpus, a task-based speech corpus designed to in-vestigate speech production across different communicative situations in Persian andGerman. The corpus consists of recordings from sixty speakers who completed a fixedsequence of seven elicitation tasks ranging from controlled read speech to more open-ended communicative conditions, including conversation, storytelling, picture descrip-tion, and voice-message tasks.All recordings were collected under standardized laboratory conditions and areaccompanied by orthographic transcriptions, phonetic segmentation, and multi-layerPraat TextGrid annotations. In addition to orthographic and phonetic information, thecorpus includes annotation layers for verbal and non-verbal elements such as pauses,fillers, and repairs.The within-speaker multi-task design enables systematic comparison of speech be-havior across communicative contexts, while the parallel structure across Persian andGerman supports cross-linguistic investigation under comparable elicitation conditions.The paper documents the design rationale, recording procedure, transcription work-flow, and annotation structure of the corpus, and presents an initial exploratory analy-sis illustrating how the dataset can be used to investigate variability in speech produc-tion across tasks, speakers, and languages through the temporal distribution of verbaland non-verbal elements.

Version published to 10.31234/osf.io/ya3c5_v2 on OSF Preprints
May 11, 2026
Version published to 10.31234/osf.io/ya3c5_v1 on OSF Preprints
Apr 17, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed