IIT Delhi Dialogue Corpus: A Quantitative Analysis of a Spoken Corpus of Hindi

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We present our effort to create a dialogue corpus for Hindi with the aim of under-standing (a) the nature of linguistic utterances during naturalistic dialogue, (b)what these linguistic patterns tell us about the cognitive processes/constraints that affect production and comprehension during dialogue, and (c) how do such processes/constraints differ from written text. We discuss the procedure and pipeline employed to create two sets of spoken data -- telephonic conversation data, and face-to-face (task-oriented) conversation data. At the lexical level, the data has been annotated for information such as disfluencies, code-switching, etc., and at the syntactic level for part-of-speech tags and dependency relations.We present a preliminary analysis of the created dialogue data and compare it with a written text to discuss the usefulness and implications of this resource for psycholinguistic research.

Article activity feed