FAIRness assessment of the metadata of omics datasets in online repositories

Nirupama Benis
Eleni Mina
Ronald Cornet

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Current omics repositories aim to make different types of omics data findable to improve data reuse. Data reuse requires sufficient metadata about the way the data was produced and processed. This is part of the "rich metadata" that the FAIR principles require. Adoption of these FAIR principles will make the ever-increasing amount of omics data more Findable, Accessible, Interoperable and Reusable for machines and humans. This is especially important in the rare disease domain, which often relies on the reuse of data for analysis, given the small number of patients with a specific rare disease. We used 2 methods to investigate the findability and reusability of omics datasets; a generic, automated tool that evaluates any resource for compliance with the FAIR principles and a use-case driven assessment specific to the rare-disease domain. We chose 7 repositories to test for human and machine findability and reusability. Results The metadata of omics datasets presented in the various webpages of repositories on average passed only 2 of the 10 tests done by the generic FAIR Evaluator. The main reason for the failed tests was the lack of embedded metadata for the tool we used to scrape from the webpages. The second method, where we searched for rare disease datasets, showed that there were many false positives among the search results over all repositories. This reduces the precision of the search, which on average was 0.09 with a maximum of 0.56 and a minimum of 0 over the 7 repositories tested. Conclusions Making metadata on omics repositories FAIR will allow users to find and reuse the datasets on the repositories efficiently. To achieve this, based on our analyses we recommend 5 actions the repositories could take. Providing machine-readable metadata using semantic web technologies like RDF, tagging datasets with metadata items and proper use of controlled terminologies are three actions that would vastly improve the Findability and Reusability of their datasets.

Version published to 10.21203/rs.3.rs-7820760/v1 on Research Square
Nov 5, 2025

Manuscript submission systems and metadata completeness in Crossref: patterns and associations

This article has 2 authors:
1. Hans de Jonge
2. Bianca Kramer
This article has no evaluationsLatest version Jan 5, 2026
Standardized API Call Protocols for implementing Federated Learning in FAIRDatabase

This article has 3 authors:
1. Sem de Regt
2. Roland V. Bumbuc
3. Vivek M. Sheraton
This article has no evaluationsLatest version Jan 27, 2026
Semantic Interoperability at National Scale: The SPHN Federated Clinical Routine Dataset

This article has 21 authors:
1. Jan Armida
2. Vasundra Touré
3. Philip Krauss
4. Deepak Unni
5. Harald Witte
6. Davide Chiarugi
7. Andrea Brites Marto
8. Julia Mauer
9. Thomas Geiger
10. Henning Beywl
11. Marc Daverat
12. Xeni Deligianni
13. Dominique Furrer
14. Mathias Gassner
15. Matthias Joos
16. Katie Kalt
17. Janshah Veettuvalappil Ikbal
18. Helena Peic Tukuljac
19. Gaëlle Vuaridel-Thurre
20. Solange Zoergiebel
21. Sabine Österle
This article has no evaluationsLatest version Dec 17, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Manuscript submission systems and metadata completeness in Crossref: patterns and associations

Standardized API Call Protocols for implementing Federated Learning in FAIRDatabase

Semantic Interoperability at National Scale: The SPHN Federated Clinical Routine Dataset