Assembly and reasoning over semantic mappings at scale for biomedical data integration

Charles Tapley Hoyt
Klas Karis
Benjamin M Gyori

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Motivation: Hundreds of resources assign identifiers to biomedical concepts including genes, small molecules, biological processes, diseases, and cell types. Often, these resources overlap by assigning identifiers to the same or related concepts. This creates a data interoperability bottleneck, as integrating data sets and knowledge bases that use identifiers for the same concepts from different resources require such identifiers to be mapped to each other. However, available mappings are incomplete and fragmented across individual resources, motivating their large-scale integration. Results: We developed SeMRA, a software tool that integrates mappings from multiple sources into a graph data structure. Using graph algorithms, it infers missing mappings implied by available ones while keeping track of provenance and confidence. This allows connecting identifier spaces between which direct mapping was previously not possible. SeMRA is customizable and takes a declarative specification as input describing sources to integrate with additional configuration parameters. We make available an aggregated mappings resource produced by SeMRA consisting of 43.4 million mappings from 127 sources that jointly cover identifiers from 445 ontologies and databases. We also describe benchmarks on specific use cases such as integrating mappings between resources cataloging diseases or cell types. Availability: The code is available under the MIT license at https://github.com/biopragmatics/semra. The mappings database assembled by SeMRA is available at https://zenodo.org/records/15208251.

Version published to 10.1101/2025.04.16.649126v1 on bioRxiv
Apr 21, 2025

Using semantic search to find publicly available gene-expression datasets

This article has 11 authors:
1. Grace S. Brown
2. James Wengler
3. Aaron Joyce S. Fabelico
4. Abigail Muir
5. Anna Tubbs
6. Amanda Warren
7. Alexandra N. Millett
8. Xinrui Xiang Yu
9. Paul Pavlidis
10. Sanja Rogic
11. Stephen R. Piccolo
This article has no evaluationsLatest version Mar 15, 2025
FAIR in practice: minimum metadata schema for bioinformatics analytics by machines

This article has 10 authors:
1. Daphne Wijnbergen
2. Núria Queralt-Rosinach
3. Valérie Barbié
4. Emma Verkinderen
5. Nirupama Benis
6. Annika Jacobsen
7. Peter A.C. ’t Hoen
8. Claudio Carta
9. Marco Roos
10. Eleni Mina
This article has no evaluationsLatest version May 7, 2025
Enhancing bio.tools by Semantic Literature Mining

This article has 10 authors:
1. Aleksandra Szmigiel
2. Ana Mendes
3. Erik Jaaniso
4. Magnus Palmblad
5. Rob M. Ewing
6. SANTOSH TIRUNAGARI
7. Tess AV Afanasyeva
8. Vedran Kasalica
9. Veit Schwämmle
10. Zunaira Shafique
This article has no evaluationsLatest version May 4, 2025

Listed in

Abstract

Article activity feed

Related articles

Using semantic search to find publicly available gene-expression datasets

FAIR in practice: minimum metadata schema for bioinformatics analytics by machines

Enhancing bio.tools by Semantic Literature Mining