HONeYBEE: Enabling Scalable Multimodal AI in Oncology Through Foundation Model-Driven Embeddings

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The increasing availability of multimodal biomedical data has created new opportunities for AI-driven oncology. However, integrating these heterogeneous data sources into unified, high-quality representations remains a critical challenge. We introduce Harmonized ONcologY Biomedical Embedding Encoder (HONeYBEE), a modular, open-source framework that preprocesses diverse oncology datasets-including clinical notes, pathology slides, radiology scans, and molecular profiles to generate consistent, high-dimensional data representations (also known as embeddings) using domain-specific foundation models. HONeYBEE simplifies downstream applications such as prognosis estimation, cancer subtype classification, and patient retrieval through a unified API. Using multimodal data from over 11,400 patients in The Cancer Genome Atlas (TCGA), we show that HONeYBEE's integrated representations improve cancer-type separability in t-SNE space and enhance survival prediction. Multimodal embeddings achieved superior performance across all survival models (C-index: CoxPH = 0.6587, RSF = 0.6252, DeepSurv = 0.6621), outperforming single-modality baselines. Fine-tuned clinical embeddings further improved classification accuracy by 14.8% compared to pretrained models. By releasing all representations, models, and pipelines, HONeYBEE enables reproducible, scalable, and clinically meaningful AI workflows for precision oncology.

Article activity feed