Characterization of metabolic phenotypes in breast cancer through the integration of genome-scale metabolic models and machine learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The metabolic heterogeneity of breast cancer represents a significant challenge for the identification of biomarkers and therapeutic targets. To address this problem, we integrated genome-scale metabolic models with machine learning algorithms, aiming to characterize the metabolic phenotypes associated with the disease.
There were 90 specific metabolic models generated from clinical and gene expression data from the TCGA-BRCA project, including 66 tumor and 24 normal samples. Metabolic fluxes were estimated using gene expression-based optimization, minimizing the weighted L2 norm. Subsequently, the Mann-Whitney test with Benjamini-Hochberg correction was applied to identify the most discriminating reactions.
We evaluated the performance of five classification algorithms (K-Nearest Neighbors, Support Vector Machines, Logistic Regression, Decision Tree, and Naive Bayes) using stratified 5-fold cross-validation. The models effectively differentiated between healthy and cancerous phenotypes, showing good overall performance, although K-Nearest Neighbors and Support Vector Machines stood out with better performance, achieving accuracy values close to 0.98 and a ROC-AUC of 1.00.
Analysis of the differentiable metabolic reactions revealed significant alterations in pathways such as extracellular transport (up to 60 significant reactions), fatty acid oxidation, and nucleotide interconversion. These results highlight the potential of the combined approach of metabolic modeling and machine learning to deepen the understanding of tumor metabolism, although the need for experimental validation and statistical refinement for future studies is emphasized.
Author summary
Given the complex metabolic heterogeneity of breast cancer, which makes it difficult to find effective biomarkers and therapies, we propose a computational approach combining genome-scale metabolic models with machine learning. Using clinical and genetic data from the TCGA-BRCA project, we generated patient-specific models, predicted metabolic fluxes by selecting the most discriminating ones, and evaluated different supervised classification algorithms to distinguish between normal and tumor tissues based on these fluxes.
Our results identify key distinct patterns of breast cancer, highlighting crucial pathways such as extracellular transport and fatty acid oxidation. This study demonstrates the potential of these tools for characterizing tumor metabolism, although we acknowledge that the sample size (n=90) represents a limitation, and future studies with more data are needed to confirm and generalize our findings. Nevertheless, this work represents a step towards the development of more precise and personalized strategies in breast cancer diagnosis or treatment.