Analyzing the Naming Conventions of Life Science Data Resources to Inform Human and Computational Findability

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study aimed to evaluate the names of life science data resources and consider the impacts on findability, a core feature of the FAIR (Findability, Accessibility, Interoperability, and Reusability) Principles. Utilizing a previously published list of unique data resources, we identified and validated data resources with both common and full names available (n = 1153). From this set, we analyzed characteristics of resource names to identify if any naming conventions have emerged organically. Additionally, since common names are often used in the absence of a resource's full name, we performed a test to evaluate our ability to infer any meaning from common names. Our results highlight suboptimal naming practices and a wide-spread opaqueness in common names, which poses challenges to resource identification and retrieval by both human- and computationally-centric methods. These results are informative for those who establish and promote data resources as well as for those who search for data to use in individual research projects, develop data discovery systems, analyze the scientific literature, or assess research infrastructure. The findings underscore the value of findability in the FAIR Principles and the current efforts to develop infrastructure that supports more efficient communication and global connectedness.

Article activity feed