BioMedGraphica: An All-in-One Platform for Joint Textual Biomedical Prior Knowledge and Numeric Graph Generation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Multi-omic data analysis is essential for scientific discovery in precision medicine. However, translating statistical results of omic data analysis into novel scientific hypothesis remains a significant challenge. Human experts must manually review analysis results and generate new hypothesis based on extensive and inter-connected biomedical prior knowledge, which is subjective and not scalable. While large language models (LLMs) can accelerate the discovery, their reasoning improves when grounded in structured, auditable and comprehensive biomedical prior knowledge. Biomedical knowledge, however, is scattered across heterogeneous databases that use diverse and inconsistent nomenclature systems, making it difficult to integrate resources into a unified format for scalable analysis. This fragmentation limits the ability of AI systems to fully leverage biomedical data for scientific discovery. To address these challenges, we developed BioMedGraphica , an all-in-one platform that harmonizes fragmented biomedical resources by integrating 11 entity types and 30 relation types from 43 databases into a unified knowledge graph containing 2,306,921 entities and 27,232,091 relations. In addition, to the best of our knowledge, this is the first work to propose a novel Textual-Numeric Graph (TNG) data-structure for multi-omics data analysis. In TNG, textual information captures prior biological knowledge (e.g., transcription start sites, functions, mechanisms), while numeric values represent quantitative biomedical features, and the integrated relations can help uncover mechanisms. By bridging prior knowledge with user-specific data, TNG is a novel and ideal data-structure for the development of graph foundation models, with the potential to improve prediction performance and interpretability, while also augmenting LLMs by supplying graph-structured mechanistic context to strengthen reasoning. The details for BioMedGraphica code can be accessed by github link: https://github.com/FuhaiLiAiLab/BioMedGraphica and BioMedGraphica knowledge graph data can be downloaded from huggingface dataset: https://huggingface.co/datasets/FuhaiLiAiLab/BioMedGraphica