What is Data? A Conceptual and Empirical Inquiry of the Facets of Data in News Organizations

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

While journalism research increasingly engages with the applications of artificial intelligence (AI), it often overlooks the foundational layer that enables AI itself: data. This study conceptualizes ‘data’ in journalism, a field profoundly shaped by data-intensive practices. Integrating perspectives from Science and Technology Studies (STS), infrastructure studies, and journalism practice, this study traces how data is embedded in news organizations. Following a Laswellian framework, we describe: What is datafied (subjects), who produces and processes data (actors), where data circulates (spaces), how data is composed (constructions), and why data is valued (values). To illustrate these facets, we construct a Patchwork News Outlet, which is a composite case that synthesizes empirical insights from six news organizations from five countries, identifying common data practices and points of divergence. Findings reveal that data practices span from the ubiquitous use of data, to selective engagement with data-driven strategies, and strategic resistance to datafication. We offer not one but five definitions of data, each grounded in a specific theoretical stream and derived from an analysis across the Lasswellian framework’s dimensions: data (1) as a socially constructed artifact shaped by power structures, biases, and strategic interests; (2) as an infrastructural entity embedded in socio-technical systems; (3) as resource for news stories; (4) as machine-readable input optimized for computational processing; and (5) as empirical representations of audience transactions and editorial practices, embedded with biases and contested value. Alongside its conceptual contribution, we introduce a systematic framework for locating research foci across intersecting dimensions of data in journalism.

Article activity feed