Towards an open-source model for data and metadata standards

Ariel Rokem
Vani Mandava
Nicoleta Cristea
Anshul Tambay
Andrew J. Connolly

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Progress in machine learning and artificial intelligence promises to advance research and understanding across a wide range of fields and activities. In tandem, increased awareness of the importance of open data for reproducibility and scientific transparency is making inroads in fields that have not traditionally produced large publicly available datasets. Data sharing requirements from publishers and funders, as well as from other stakeholders, have also created pressure to make datasets with research and/or public interest value available through digital repositories. However, to make the best use of existing data, and facilitate the creation of useful future datasets, robust, interoperable and usable standards need to evolve and adapt over time. The open-source development model provides significant potential benefits to the process of standard creation and adaptation. In particular, data and meta-data standards can use long-standing technical and socio-technical processes that have been key to managing the development of software, and which allow incorporating broad community input into the formulation of these standards. On the other hand, open-source models carry unique risks that need to be considered. This report surveys existing open-source standards development, addressing these benefits and risks. It outlines recommendations for standards developers, funders and other stakeholders on the path to robust, interoperable and usable open-source data and metadata standards.

Version published to 10.31219/osf.io/br6u2 on OSF Preprints
Oct 23, 2024

Standardized API Call Protocols for implementing Federated Learning in FAIRDatabase

This article has 3 authors:
1. Sem de Regt
2. Roland V. Bumbuc
3. Vivek M. Sheraton
This article has no evaluationsLatest version Jan 27, 2026
BH25DE report: On the path to machine-actionable training materials

This article has 11 authors:
1. Phil Reed
2. Nick Juty
3. Petra Steiner
4. Leyla Jael Castro
5. Charles Tapley Hoyt
6. Oliver Knodel
7. Martin Voigt
8. Roman Baum
9. Dilfuza Djamalova
10. Jacobo Miranda
11. Alban Gaignard
This article has no evaluationsLatest version Jan 26, 2026
Manuscript submission systems and metadata completeness in Crossref: patterns and associations

This article has 2 authors:
1. Hans de Jonge
2. Bianca Kramer
This article has no evaluationsLatest version Jan 5, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Standardized API Call Protocols for implementing Federated Learning in FAIRDatabase

BH25DE report: On the path to machine-actionable training materials

Manuscript submission systems and metadata completeness in Crossref: patterns and associations