Decoding Breast Cancer Heterogeneity via Multi-Omics Integration and Language Model-Based Interpretation

Robail Yasrab
Ruchit Agrawal
Maha Mohamed Saber-Ayad
Mohamed El-Hadidi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We present a novel pipeline combining Multi-Omics Factor Analysis (MOFA) and fine-tuned Large Language Models (LLMs) to predict breast cancer subtypes using proteomics, DNA methylation, and RNA-Seq data. Breast cancer is a heterogeneous disease characterized by diverse molecular alterations across multiple biological layers, necessitating integrative approaches for accurate subtype classification. Our methodology leverages MOFA for dimensionality reduction to identify key latent factors driving het-erogeneity, followed by LLM fine-tuning on these multi-omics signatures to enhance prediction accuracy. MOFA analysis identified five key latent factors capturing distinct biological processes: immune response, cell cycle regulation, metabolic reprogramming, tumor microenvironment interactions, and DNA repair mechanisms. We extracted the top features per omics layer for each factor and performed Gene Set Enrichment Analysis (GSEA) to characterize their biological significance. Our LLM, trained on curated multi-omics signatures and clinical metadata encoded as structured text prompts, significantly outperformed conventional statistical models in subtype classification, achieving AUC=0.93 and accuracy=0.89, compared to Random Forest (AUC=0.87, accuracy=0.82) and SVM (AUC=0.85, accuracy=0.80). The superior performance of our approach is attributed to the LLM’s ability to capture complex, non-linear relationships and hierarchical feature interactions across omics layers. This integrative pipeline provides both improved predictive performance and interpretable biological insights, offering potential for enhanced clinical decision-making in breast cancer management.

Version published to 10.1101/2025.06.26.661832 on bioRxiv
Jun 30, 2025

Multi-Omic Integration and Machine Learning Reveal Regulatory Networks Driving Breast Cancer Progression

This article has 2 authors:
1. Unmilita Das Moon
2. Kushal Raj Roy
This article has no evaluationsLatest version Dec 11, 2025
Deep Learning Architectures for Multi-Omics Data Integration: Bridging Biomarker Discovery and Clinical Translation

This article has 2 authors:
1. Akshay Krishnan Pushparaj
2. Malarmathi Muthukumar
This article has no evaluationsLatest version Jan 26, 2026
Integrative Multi-Omics Profiling and Machine Learning Identify Key Molecular Determinants Distinguishing Glioblastoma from Lower-Grade Glioma

This article has 2 authors:
1. Amir Mahdi Taghizadeh
2. Pourya Soflaee
This article has no evaluationsLatest version Jan 5, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multi-Omic Integration and Machine Learning Reveal Regulatory Networks Driving Breast Cancer Progression

Deep Learning Architectures for Multi-Omics Data Integration: Bridging Biomarker Discovery and Clinical Translation

Integrative Multi-Omics Profiling and Machine Learning Identify Key Molecular Determinants Distinguishing Glioblastoma from Lower-Grade Glioma