Minimum genomic data sets for rare diseases: A systematic review
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Minimum data sets (MDS) are used to harmonize the capture and exchange of rare-disease information across studies and care settings, but the genomic component of these frameworks is often inconsistently specified. In our sample of included studies (n = 23), only 2 explicitly reported using Whole-Exome Sequencing (WES) or Whole-Genome Sequencing (WGS), highlighting a persistent gap in genomic method reporting alongside heterogeneity in scope, standards adoption, and reported impacts. Methods We performed a systematic review (searches through 2024) to identify publications proposing, developing, or applying MDS that included genomic elements for rare diseases. Screening was conducted in two steps: (i) independent title/abstract screening by two reviewer pairs with conflict resolution by a third reviewer, followed by (ii) independent full-text assessment by two reviewers. We extracted study characteristics, MDS domains, intended use context, referenced standards/ontologies, level of genomic reporting, and reported outcomes. Results were summarized with descriptive statistics, Jaccard-based co-occurrence patterns, and exploratory association analyses. Results Twenty-three studies met the inclusion criteria and were mostly produced in Europe and North America. Clinical/phenotypic information was nearly universal (95.7%), whereas genomic data were included in 69.6% of cases and were usually described without specifying the sequencing modality. Most studies targeted biomedical/genomic research (91.3%) and clinical diagnosis/care (69.6%). Standards use was modest (median = 1 per study), with the most frequent being HPO (26.1%), Orphanet/Orphacode (21.7%), FAIR (17.4%), and ICD (8.7%). Reported benefits were more common at the system level (e.g., interoperability or policy-related outputs) than as consistently quantified clinical effects. Exploratory analyses suggested that practices such as planned reanalysis, phenotype–genotype linkage, and explicit handling of structural variants may be associated with greater clinical/knowledge gains than the sequencing modality alone, although evidence remained insufficient to draw firm causal conclusions. Conclusions Rare-disease MDS commonly captures clinical information but often underspecifies core genomic details and inconsistently applies standards, limiting comparability and interoperability. Progress would benefit from a minimal genomic reporting core (sequencing approach, reference genome, variant classes, and analysis/annotation pipeline descriptors) aligned with widely used ontologies and interoperability principles, together with routine inclusion of patient-centered outcomes and biospecimen linkages.