Data-usage descriptors as search metadata: the case of food security data and the National Data Platform (2015-2025)
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Scientific data is a critical input into scientific research. Yet the research data landscape is constantly changing as new datasets emerge, others are retired, or some disappear altogether. Data-usage descriptors can substantially advance research productivity by reducing the time that researchers spend finding new and relevant datasets in their research field. This paper describes how to generate data usage descriptors by finding how datasets are used in publications and then linking the dataset information to the publication metadata. It also shows how usage descriptors can be used to find other related datasets and their usage. It concludes by arguing that the approach represents a critical piece of foundational infrastructure that could be deployed in repositories as part of a referenceable, navigable, and contextual data framework. This article contains a reproducible workflow for constructing data-usage descriptors, based on analyzing the full text of publications in the Dimensions database. The illustrative use case is research on food security. The illustrative repository is the National Data Platform.