Building an Interoperable Rare Disease Multi-omic Resource: The GREGoR Data Model and Dataset

Benjamin D Heavner
Marsha M Wheeler
Jesse D Bengtsson
Claudia M B Carvalho
Warren A Cheung
Matthew P Conomos
Emmanuele C Delot
Stephanie DiTroia
Vijay S Ganesh
Stephanie M Gogarten
Christopher M Grochowski
Shalini N Jhangiani
Charles H King
Cas LeMaster
Colby T Marvin
Shruti Marwaha
Danny E Miller
Anne O'Donnell-Luria
Lynn Pais
Karynne Patterson
Guanghao Qi
Matthew Richardson
Craig Smail
Adrienne M Stilp
Catherine C Tong
Rachel A Ungar
Ben Weisburd
Michael J Bamshad
Jonathan A Bernstein
Evan E Eichler
Richard A Gibbs
James R Lupski
Susanne J May
Stephen B Montgomery
Tomi Pastinen
Jennifer Posey
Heidi L Rehm
Ali Shojaie
Michael E Talkowski
Eric Vilain
Chia-Lin Wei
Matthew T Wheeler
Qian Yi
Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium
GREGoR Consortium Data Standards and Analysis Working Group
Seth I Berger
Jessica X Chong

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Rare disease research and diagnosis rely on the integration of genomic and phenotypic data generated across diverse clinical sites; however, the absence of widely adopted standards for representing genomic data and associated metadata has limited data interoperability, reuse, and cross-study analysis. The Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium was established to investigate challenging rare disease cases and evaluate emerging multi-omic technologies for clinical translation. To support coordinated data integration across distributed research sites, we developed a common Consortium Data Model in partnership with domain experts to standardize the capture of participant-, family-, phenotype- and assay-level metadata, with a particular emphasis on using a modular architecture to support linking of multiple data versions from multiple omic technologies to a single individual and attribution of a genetic finding to the specific technology used for its initial discovery. Adoption of the GREGoR Data Model has enabled continued generation and public release of a harmonized, analysis-ready Consortium Dataset. The most recent release includes phenotypic, family and multi-omic data from 12,292 participants in 5,029 families. Other rare disease data sharing efforts are beginning to adopt this data model which will facilitate cross consortium analyses and empower rare disease research. This work demonstrates that a collaborative, flexible, and scalable data model can enable large-scale rare disease research, facilitate cross-center data harmonization, and enable data interoperability.

Version published to 10.64898/2026.05.15.725546 on bioRxiv
May 19, 2026

OmicsPred as a centralised resource for genetic prediction of multi-omic traits

This article has 9 authors:
1. Carles Foguet
2. Laurent Gil
3. Yu Xu
4. Sofía Salazar-Magaña
5. Scott C. Ritchie
6. Elodie Persyn
7. Hae Kyung Im
8. Michael Inouye
9. Samuel A. Lambert
This article has no evaluationsLatest version May 19, 2026
Interpreting Omics Data Analysis with Large Language Models for Disease Target and Drug Discovery

This article has 10 authors:
1. ZIXI XU
2. Weihang Chen
3. Wuyu Ren
4. Tianqi Xu
5. Somadina Amaechin
6. Raad Khan
7. Yixin Chen
8. Michael Province
9. Philip Payne
10. Fuhai Li
This article has no evaluationsLatest version May 5, 2026
T-Rex: Standardized Analysis of Germline Variants in Whole-Exome Sequencing Trios

This article has 8 authors:
1. Sara-Luisa Reh
2. Carolin Walter
3. Judith Lohse
4. Tabita Ghete
5. Markus Metzler
6. Anne Quante
7. Julia Hauer
8. Franziska Auer
This article has no evaluationsLatest version Apr 1, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

OmicsPred as a centralised resource for genetic prediction of multi-omic traits

Interpreting Omics Data Analysis with Large Language Models for Disease Target and Drug Discovery

T-Rex: Standardized Analysis of Germline Variants in Whole-Exome Sequencing Trios