From Open Data to Open Science?: A Semantic Diagnosis of Public Science and Technology Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Open Science has emerged as a central paradigm in contemporary science and technology (S&T) policy, with Open Data widely regarded as one of its core components. Despite this prominence, limited empirical attention has been paid to whether Open Data occupies a structurally meaningful position within the semantic architecture of Open Science discourse. This study conducts a computational semantic analysis of public S&T data-related documents to diagnose the conceptual relationship between Open Data and Open Science. Using BERTopic-based modeling and hierarchical clustering, we examine how Open Data is positioned within the broader Open Science discourse, focusing on its centrality, proximity to key Open Science concepts, and alignment with FAIR principles. The results reveal that while Open Data is frequently referenced, it exhibits a distinct core-periphery structure: administrative and management-oriented metadata occupy a central semantic position, whereas scientifically rich raw data tend to remain on the periphery. The structural analysis further indicates that the semantic integration of Open Data remains uneven across domains, suggesting a partial decoupling between policy expectations and conceptual implementation. By providing a semantic diagnosis of Open Data within Open Science discourse, this study contributes to scientometric research by offering a structural perspective on how foundational concepts of Open Science are articulated and operationalized in practice. The findings highlight the need to move beyond declarative commitments toward a more conceptually integrated understanding of Open Data in the evolution of Open Science.

Article activity feed