Pertsch: A Corpus of Persian and German Based on Different Speech Elicitation Tasks

Neda Mousavi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper introduces the Pertsch Corpus, a speech corpus designed to capture variations in speech production across a range of speech elicitation tasks in Persian and German. The corpus consists of recordings of sixty speakers who completed a fixed sequence of seven speech tasks, ranging from controlled reading tasks to more open-ended communicative situations.All recordings were collected under standardized laboratory conditions and are accompanied by orthographic transcriptions, phonetic segmentation, and annotations on verbal and nonverbal elements. The multi-task design within a single speaker allows for a systematic comparison of speech across communicative contexts, while the parallel structure across languages supports cross-linguistic analysis.Descriptive statistics provide an overview of the temporal and structural properties of the dataset and illustrate how speech organization varies depending on the task and speaker. Meanwhile, the corpus offers a flexible resource for more detailed analyses at various levels, including phonetic, prosodic, and temporal dimensions.

Version published to 10.31234/osf.io/ya3c5_v1 on OSF Preprints
Apr 17, 2026

Building a Corpus to Analyze Stuttering according to a Dynamic Model of Speech Rhythm Production

This article has 2 authors:
1. Sandra Merlo
2. Luciana Lucente
This article has no evaluationsLatest version Mar 2, 2026
Building a Corpus to Analyze Stuttering according to a Dynamic Model of Speech Rhythm Production

This article has 2 authors:
1. Sandra Merlo
2. Luciana Lucente
This article has no evaluationsLatest version Mar 2, 2026
Exploring Intonational Contours in Human and Synthetic Speech: An F0-Based Study of Venezuelan Spanish

This article has 2 authors:
1. João Paulo Moraes Lima dos Santos
2. Eugenia San Segundo
This article has no evaluationsLatest version Feb 28, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Building a Corpus to Analyze Stuttering according to a Dynamic Model of Speech Rhythm Production

Building a Corpus to Analyze Stuttering according to a Dynamic Model of Speech Rhythm Production

Exploring Intonational Contours in Human and Synthetic Speech: An F0-Based Study of Venezuelan Spanish