Integrating Machine Learning with Metabolic Models for Precision Trauma Care: Personalized ENDOTYPE Stratification and Metabolic Target Identification
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Purpose : The pipeline developed here, utilizes the metabolic flux characterization of trauma patients as input for a computational framework, amalgamating various machine learning (ML) techniques. This pipeline enables the stratification of trauma patients into ENDOTYPES and pinpoint key metabolic targets associated with patient-specific ENDOTYPES. Previous studies have identified four distinct ENDOTYPES based on metabolic profiling in blood samples from patients in the context of Shock induced endotheliopathy (SHINE). These ENDOTYPES, labeled A, B, C, and D, exhibited notable differences in mortality ratios in a cohort of trauma patients, independent of injury severity. The endothelium, which is a tissue in direct contact with blood, plays a pivotal role in the composition of blood, thus impacting endothelial metabolism and consequently pathology. To address this, we developed an innovative approach combining genome-scale metabolic models and patients' plasma metabolomics. Although this methodology yields significant correlations between endothelial metabolic functions and mortality rates in ENDOTYPES, classical linear multivariate techniques did not uncover individual patient correlations, suggesting a complex, nonlinear relationship. To overcome this limitation, we have developed a novel computational approach that merges machine learning techniques with mechanistic modeling to stratify trauma patients into ENDOTYPES and identify patient-specific metabolic mechanisms associated with this pathology as potential therapeutic targets. Methods : The proposed pipeline begins with the application of our in-house mechanistic modeling method developed by Silva-Lance et al. (2024). This method integrates patients' plasma metabolome data with genome-scale metabolic models, serving as a data augmentation technique. It generates a cubic matrix that encapsulates the diverse metabolic flux profiles characterizing each patient. The pipeline then follows a series of steps: i.Preprocessing: Noise reduction, sparsity alleviation, outlier treatment, and numerical stability enhancement. ii.Dimension Reduction: Principal Component Analysis (PCA) is applied to distill essential features from the high-dimensional data. iii.Classification: A machine learning ensemble technique, the Gradient Boosting (GB) Algorithm, is employed for data classification. Finally, the Shapley Additive Explanation (SHAP) method, a game theory-based approach, is utilized to identify key metabolic reactions linked to distinct ENDOTYPES. These findings hold potential for guiding therapeutic applications. Results : The modeling-based data augmentation technique generated a cubic matrix encompassing 95 patients, each characterized by 3,006 metabolic reactions and 18,036 sampled solutions, totaling over five billion data points. Through preprocessing and dimensionality reduction, this matrix was condensed to 3% of its original size, retaining 2,656 reactions and 600 components for the 95 patients. This reduced dataset was used to train a robust GB model, achieving exceptional accuracy above 99.99% in stratifying patients by ENDOTYPES. SHAP analysis of the model revealed key mechanisms underlying patient classification, identifying critical reactions, genes, and pathways as potential therapeutic targets for personalized treatments. Among these, amino acid and fatty acid metabolism emerged as the most significant processes. Specific metabolites, such as amino acids, sphingosine, and melatonin, were identified as essential for ENDOTYPE stratification. Additionally, the dopamine receptor D1 (DRD1), a gene involved in amino acid transport, was highlighted as the most significant genetic factor. Notably, a group of seven metabolic reactions was found to act as a "switch," regulating the classification of patients between groups A and D, which exhibit the lowest and highest mortality rates, respectively. Low activity in these reactions was incompatible with the group A phenotype, favoring group D, while high activity produced the opposite effect. To evaluate their importance, a GB model trained solely on these seven reactions (representing 0.0077% of the original dataset) achieved a substantial accuracy of 78%, underscoring their role in modulating endothelial responses to trauma in the SHINE context. Interestingly, the dysregulation of all seven reactions contributes to either increased production of reactive oxygen species (ROS) or reduced ROS detoxification capabilities. This convergence leads to elevated levels of ROS in patients with ENDOTYPE D, aligning with clinical observations noted in the context of SHINE. A simulated treatment targeting the group of seven metabolic reactions in patients with ENDOTYPE D resulted in an average reduction in the expected mortality rate of 18.5%. Additionally, using the top 50 most critical reactions, a reduced version of the GB model was developed, achieving accuracies exceeding 99.99%. These streamlined models are more computationally efficient, making them highly suitable for implementation in low-cost, portable clinical devices, facilitating their integration into routine medical practice. Conclusion : This methodology has potential applications beyond trauma care, including cancer and cardiovascular diseases, where patient heterogeneity is crucial. By integrating GEMs with machine learning, it creates a framework for understanding complex diseases such as sepsis, cancer, ... and offers a platform for precision medicine and personalized therapies.