BioMedGraphica: An All-in-One Platform for Joint Textual Biomedical Prior Knowledge and Numeric Graph Generation

Heming Zhang
Shunning Liang
Tim Xu
Wenyu Li
Di Huang
Yuhan Dong
Guangfu Li
J. Philip Miller
S. Peter Goedegebuure
Marco Sardiello
Jonathan Cooper
William Buchser
Patricia Dickson
Ryan C. Fields
Carlos Cruchaga
Yixin Chen
Michael Province
Philip Payne
Fuhai Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Multi-omic data analysis is essential for scientific discovery in precision medicine. However, translating statistical results of omic data analysis into novel scientific hypothesis remains a significant challenge. Human experts must manually review analysis results and generate new hypothesis based on extensive and inter-connected biomedical prior knowledge, which is subjective and not scalable. While large language models (LLMs) can accelerate the discovery, their reasoning improves when grounded in structured, auditable and comprehensive biomedical prior knowledge. Biomedical knowledge, however, is scattered across heterogeneous databases that use diverse and inconsistent nomenclature systems, making it difficult to integrate resources into a unified format for scalable analysis. This fragmentation limits the ability of AI systems to fully leverage biomedical data for scientific discovery. To address these challenges, we developed BioMedGraphica , an all-in-one platform that harmonizes fragmented biomedical resources by integrating 11 entity types and 30 relation types from 43 databases into a unified knowledge graph containing 2,306,921 entities and 27,232,091 relations. In addition, to the best of our knowledge, this is the first work to propose a novel Textual-Numeric Graph (TNG) data-structure for multi-omics data analysis. In TNG, textual information captures prior biological knowledge (e.g., transcription start sites, functions, mechanisms), while numeric values represent quantitative biomedical features, and the integrated relations can help uncover mechanisms. By bridging prior knowledge with user-specific data, TNG is a novel and ideal data-structure for the development of graph foundation models, with the potential to improve prediction performance and interpretability, while also augmenting LLMs by supplying graph-structured mechanistic context to strengthen reasoning. The details for BioMedGraphica code can be accessed by github link: https://github.com/FuhaiLiAiLab/BioMedGraphica and BioMedGraphica knowledge graph data can be downloaded from huggingface dataset: https://huggingface.co/datasets/FuhaiLiAiLab/BioMedGraphica

Version published to 10.1101/2024.12.05.627020 on bioRxiv
Dec 9, 2024

PRESSnet: a novel framework for patient stratification and biomarker discovery using clinical knowledge graphs

This article has 11 authors:
1. Jake Cohen-Setton
2. Shruti Shikhare
3. Ioannis Kagiampakis
4. Domingo Salazar
5. Miguel Goncalves
6. Elizabeth Coker
7. Sanddhya Jayabalan
8. Damian Bikiel
9. Ben Sidders
10. Etai Jacob
11. Krishna Bulusu
This article has no evaluationsLatest version Dec 15, 2025
Deep Learning Architectures for Multi-Omics Data Integration: Bridging Biomarker Discovery and Clinical Translation

This article has 2 authors:
1. Akshay Krishnan Pushparaj
2. Malarmathi Muthukumar
This article has no evaluationsLatest version Jan 26, 2026
LLMAgent4Bio: LLM Agents for Biological Intelligence Across Genomics, Proteomics, Spatial Biology, and Biomedicine

This article has 9 authors:
1. Sajib Acharjee Dip
2. Dipanwita Mallick
3. Uddip Acharjee Shuvo
4. Shovito Barua Soummo
5. Fazle Rafsani
6. Bikash Kumar Paul
7. Nazifa Ahmed Moumi
8. Shafayat Ahmed
9. Liqing Zhang
This article has no evaluationsLatest version Dec 16, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

PRESSnet: a novel framework for patient stratification and biomarker discovery using clinical knowledge graphs

Deep Learning Architectures for Multi-Omics Data Integration: Bridging Biomarker Discovery and Clinical Translation

LLMAgent4Bio: LLM Agents for Biological Intelligence Across Genomics, Proteomics, Spatial Biology, and Biomedicine