The Death of the Author, Reconsidered: Spatial and Demographic Constraints on College Admissions Essay Writing
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Computational text analysis has grown in popularity among social scientists due to the massive influx of digitized data available to study. However, much of this research disconnects patterns observed in text from information about the original authors. Eliding authorship considerations from sociological analysis of text can potentially lead to claims and assertions of trends that are independent from the social actors, conditions, interactions, and contexts which the text was produced. While text analysis without authorship information can yield reasonable inferences about society, complementing that approach with research that explicitly considers the people producing the text could expand the theoretical and empirical scope of work in this area. In this paper, we adapt perspectives from sociolinguistics and explicitly consider categorical identity markers of authors and geography as foundational axes of variation in textual data. We explore these dimensions in a large corpus of college admissions essays (n = 254,820 essays submitted by 83,538 applicants) and metadata about applicant identity, including the ZIP code of their high school. After generating features of the essays using computational methods, we find that author identity markers, such as gender, parental education, and socioeconomic status are highly salient. We also find that ZIP code level socioeconomic measures are extremely correlated with the writing style and content of local applicants. We also find that individuals whose personal identities are spatially unique–that is, demographically different from others in their immediate content–were most likely to be misclassified by our models, indicating that writing is influenced both socially and spatially. This work clarifies how authorship characteristics, like identity and spatial context, constrain the breadth of what we write and how we write by showing strong alignment between text and authors that is observable through machine reading of text.