IIT Delhi Dialogue Corpus: A Quantitative Analysis of a Spoken Corpus of Hindi
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We present our effort to create a dialogue corpus for Hindi with the aim of under-standing (a) the nature of linguistic utterances during naturalistic dialogue, (b)what these linguistic patterns tell us about the cognitive processes/constraints that affect production and comprehension during dialogue, and (c) how do such processes/constraints differ from written text. We discuss the procedure and pipeline employed to create two sets of spoken data -- telephonic conversation data, and face-to-face (task-oriented) conversation data. At the lexical level, the data has been annotated for information such as disfluencies, code-switching, etc., and at the syntactic level for part-of-speech tags and dependency relations.We present a preliminary analysis of the created dialogue data and compare it with a written text to discuss the usefulness and implications of this resource for psycholinguistic research.