Sharing and preserving sociolinguistic corpora on the U.S.-Mexico Border

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Since William Labov outlined the methodology for the sociolinguistic interview in 1972, sociolinguistic corpora have been used widely in the field of sociolinguistics to study diverse speech communities and linguistic features. However, most of these invaluable sociolinguistic collections have been available only to the individual researcher or research group, and these data sets usually disappear from use with that individual scholar. More recently, there has been a push towards data sharing in sociolinguistics, reflective of data sharing and the open science movement in other fields. Still, accessible online sociolinguistic corpora are few and far between, in part due to the intense time commitment required to create, sustain, share, and preserve such collections. This paper reviews two accessible online sociolinguistic collections at the U.S.-Mexico border: the Corpus de Español en el Sur de Arizona [Corpus of Spanish in Southern Arizona] or CESA (Carvalho, 2012) and the Corpus Bilingüe del Valle [Bilingual Corpus of the Valley] or CoBiVa (Christoffersen & Bessett, 2019; Christoffersen & Ciller, 2024) in South Texas. We explore these two corpora as case studies for data sharing and preservation through collaboration by detailing the data collection and data management protocols and preservation plans. In doing so, we demonstrate how data sharing in sociolinguistics impacts accessibility, reproducibility, and the democratization of knowledge.

Article activity feed