FAIRness assessment of the metadata of omics datasets in online repositories
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Current omics repositories aim to make different types of omics data findable to improve data reuse. Data reuse requires sufficient metadata about the way the data was produced and processed. This is part of the "rich metadata" that the FAIR principles require. Adoption of these FAIR principles will make the ever-increasing amount of omics data more Findable, Accessible, Interoperable and Reusable for machines and humans. This is especially important in the rare disease domain, which often relies on the reuse of data for analysis, given the small number of patients with a specific rare disease. We used 2 methods to investigate the findability and reusability of omics datasets; a generic, automated tool that evaluates any resource for compliance with the FAIR principles and a use-case driven assessment specific to the rare-disease domain. We chose 7 repositories to test for human and machine findability and reusability. Results The metadata of omics datasets presented in the various webpages of repositories on average passed only 2 of the 10 tests done by the generic FAIR Evaluator. The main reason for the failed tests was the lack of embedded metadata for the tool we used to scrape from the webpages. The second method, where we searched for rare disease datasets, showed that there were many false positives among the search results over all repositories. This reduces the precision of the search, which on average was 0.09 with a maximum of 0.56 and a minimum of 0 over the 7 repositories tested. Conclusions Making metadata on omics repositories FAIR will allow users to find and reuse the datasets on the repositories efficiently. To achieve this, based on our analyses we recommend 5 actions the repositories could take. Providing machine-readable metadata using semantic web technologies like RDF, tagging datasets with metadata items and proper use of controlled terminologies are three actions that would vastly improve the Findability and Reusability of their datasets.