Machine learning ensemble identifies distinct age-related response to spaceflight in mammary tissue

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Spaceflight presents unique environmental stressors, such as microgravity and radiation, that significantly affect biological systems at the molecular, cellular, and organismal levels. Female astronauts, in particular, face an increased risk of developing breast cancer due to exposure to ionizing radiation and other spaceflight-related factors. Age also plays a crucial role in the mammary gland’s response to these stressors, with younger organisms generally exhibiting more efficient response mechanisms than older ones. In this study, we utilized an ensemble of machine learning algorithms to analyze gene expression profiles from mammary tissue of young and old female mice exposed to spaceflight to predict age (old vs young) and condition (spaceflight vs ground control). Using the genes our ensemble identified as most predictive, we investigate the molecular pathways involved in spaceflight-related health risks, particularly in the context of breast cancer and cardiovascular health. We identified age-dependent differences in the gene expression profiles of spaceflight-exposed mice compared to ground control.

Specifically, younger mice exhibited enriched pathways related to cellular structure, while older mice showed activation of pathways involved in cortisol synthesis and muscle contraction. All mice responded to spaceflight with evidence of elevated lipid metabolic function. These findings highlight the critical role of age in modulating the response to spaceflight-induced stress and suggest that these molecular pathways may contribute to differential outcomes in tissue homeostasis, cardiovascular and metabolic disorders, and breast cancer susceptibility.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/15177163.

    This review is the result of a virtual, collaborative live review discussion organized and hosted by PREreview and JMIR Publications on March 21, 2025. The discussion was joined by 21 people: 3 facilitators from the PREreview Team, 1 member of the JMIR Publications team, 1 author, and 16 live review participants, including 3 who agreed to be named: Matthew W Darlison, Luciana Gallo and Meghal Gandhi . The authors of this review have dedicated additional asynchronous time over the course of two weeks to help compose this final report using the notes from the Live Review. We thank all participants who contributed to the discussion and made it possible for us to provide feedback on this preprint.

    Summary:

    In this study, a small existing dataset of mammary gland cell gene transcription in female mice was subjected to various combinations of AI binary classification techniques to distinguish young vs. old female mice and those exposed or not exposed to a prolonged stay in space. The authors applied machine learning methods to analyze data on mammary gland gene expression in mice newly returned from a prolonged stay in Earth orbit as compared to controls remaining on the ground in order to identify which genes were affected by the spaceflight experience and how the age of a mouse influenced this response. The underlying theory is that a cell's "strategy" for adapting to certain stressors may change as a mouse ages… a qualitative change rather than overall quantitative deterioration of resiliency. 

    Discovery of key genes involved pinpointing which ones stood out as enabling classification of mice as either young or old and, separately, as having flown in space or not. Because of the small size of the dataset (which had been collected for other research), a conventional random forest approach lacked sufficient power to identify critical genes. Instead, the authors describe trying various ensembles of machine learning tools until eventually selecting several candidate genes. By associating those genes with metabolic pathways they then suggest a plausible description of cells of younger mice activating cell structure/cell adhesion related mechanisms, while older mice activate pathways involved in cortisol synthesis and cardiac muscle contraction.

    The application of innovative, computerized techniques e.g., machine learning and algorithms to better understand gene expression offers fresh insight into spaceflight in animal models. More specifically, the research sheds new light on molecular pathways implicating in spaceflight-related health risks.  This is particularly important in understanding the pathogenesis of a large number of diseases such as cancer that is often characterized by the development of abnormal tissues.  However, the study has a few shortfalls as outlined above. Perhaps, a section of the paper should be devoted to limitation of the research. A brief ethical explanation could provide more clarity with the approach of the research.  It should be made clear early that the experiment/analysis was done "in silico."  Additionally, the experimentation on mice may overlook biological properties in humans, therefore arguments should only be extended and scoped on mice.

    List of major concerns:

    • The title should be more specific with respect to the source of mammary tissue: identify "mouse mammary gland tissue" in the title or, perhaps, simply "murine mammary tissue."

    • While the methodology is interesting and the findings certainly warrant further study, this should be clearly identified as formative research: there was no pre-registration of hypotheses and methods and the findings (list of key genes and of pathways differing according to age) are just suggestive and not at all robust or convincing. Accordingly, some detail about the experiences of the mice and physiological values is beside the point, so we suggest it is moved to a 'Supplements' section along with more specifics about machine learning parameters, etc. that could help researchers attempting similar approaches. 

    • With respect to the OSD-511 dataset: the details of Rodent Research Reference Mission 1 (RRRM-1) need revision, as it was mentioned that there are 40 female BALB/cAnNTac mice, while the total number of animals used is 43; 21 younger mice and 22 older mice. Moreover, the 8 YNG mice that were kept in standard cages were exposed to different conditions from the 7 OLD mice that were housed in flight hardware. 

    • In addition, it was mentioned that each group of space-flown (FLT) mice had corresponding control groups (GC), but it is not clear which basal controls (10 mice euthanized one day post-launch) are used to compare which group. This is important to explain the single group called "non-flight" that is mentioned later in the paragraph, and indicate if these latter details from the original experiment are not available to the authors. 

    • In the Discussion section, or as a separate Limitations sections, consider explicitly pointing out that data of experimental mice which were collected just once after 40 days in space and 2 days post return recovery provides only a cross- sectional data, and does not capture changes in the mice which could be evident while in space or longer after return from space. Also, the description for Figure 1 mentions about Figure 1e and 1f, which are not available in the figure.

    • Small sample size should be acknowledged, which means the outcome models may not be able to generalize well on unseen data, in downstream tasks.

    List of minor concerns:

    • The title could be enhanced to make it clear that this was an experiment based on a model organism (mouse) and not human.

    • Reviewers acknowledge the availability of details that enable the reproducibility of the study, such as publicly accessible data sources and detailed description of data handling and analysis procedures. However, the reviewers wondered whether the source code used could be availed for enhancing the reproducibility.

    • The total number of mice stated that were used in the study does not correspond with the total number used, based on the breakdown of individual group numbers. Authors need to cross-check the numbers to ensure that they tally with the numbers used. 

    • Clarify the composition of the control cohort, refer to those mice in a consistent way, and discuss differences that were found to exist between the subsets of controls.

    • In page 4, under the Data Transformation Section: it is stated that "four filtering methods were performed", but figure 2b only represents three filters. Kindly clarify if the fourth filtering method was used but not included in the figure, or whether there was a mistake in either the figure or the text for the sake of consistency.

    • In page 6, the last paragraph:  a linear regression model was used to predict the weight of mice at euthanasia, but the significance of this prediction was not discussed. The significance should be discussed for a better understanding of its applicability. Add a brief discussion of the significance of the model, which may include a statistical test validation such as p-values and or confidence intervals.

    • In page 15, under the Conclusion Section, it is also mentioned that "The dysregulation of ECM remodeling, cytoskeletal function, and stress response pathways was observed in radiation-exposed mice"--but radiation exposure was not the intervention applied. Revise this statement to accurately reflect the intervention applied in this study (spaceflight) and ensure the conclusion is per the experimental conditions.

    • In the Discussion Section, some results are repeated instead of being analysed in depth. Focus more on interpreting the results, compare them with similar studies, and discuss their significance.

    • Only Accuracy is reported for model performance metric. Add other metrics including AUC-ROC, Sensitivity, Specificity, and F1 Score, to enhance the assessment of the model's predictive ability.

    • Under the algorithms discussion, remove possessive apostrophe from the "1950's."

    • It may help to add a statement to make it explicit whether ethics approval was necessary for the study. In addition, it would add value in discussing ethical implications of collecting the dataset used in the manuscript with reference to any discussion in previous publications or from the authors who collected the original data. 

    Concerns with Figures and Tables

    • Most figures have poor resolution, which makes them difficult to understand or interpret. It would be helpful to regenerate the figures with better resolution.

    • It would be helpful to add details to the captions to include what's represented in each panel and any elements of statistics.

    • Creating a table to present the various groups and their characteristics, including GC, would help improve readability. 

    • Figure 1 lacks an adequate explanation of each panel that will clarify what they represent.

    • Table 1 is not clear, making it difficult to read.

    • The top and left parts of Figure 7 are cropped and it's possible important information is omitted.

    • The legend refers to plots by layout (left/right), duplicating the role of (a)-(d) labels. Also, plot titles are not the most prominent text and are not referenced in the text.

    • In figure 4, the term "accuracy" is used without clarification.

    • Abbreviations used in figures 2 and 3  are not explained.

    • The figure 3 legend does not clearly describe the difference between left and right diagrams.

    • The manuscript refers to Table 1 sub-sections "e" and "f," which are not present. Some figures are also unclear and not explanatory enough.

    • Figure 5 Fonts are too small to read, and part of the legend is cropped.

    • In figure 1, the caption states that the left plots represent ground mice and the right plots represent space mice, which is not reflected in the figure.

    • In page 4, the PCA statement interpreting figures 1a and 1d is misleading. The statement suggests that both Figure 1a and 1d show PCA for spaceflight, whereas Figure 1a only represents ground mice.

    • The text for figure 1 describes Figure 1e and 1f, but these panels are not present.

    Additional comments 

    • Consider revising the title and abstract to identify that the study was conducted with data collected in a model organism or murine model

    • The 2nd page, 2nd sentence of the 1st paragraph "Female astronauts in particular have an increased risk of breast cancer due to exposure to galactic cosmic radiation (7)." Please revise the reference as Kumar, K. et al. (2024) did not investigate or conclude the mentioned data. https://doi.org/10.3390/cancers16233954 

    • On the 2nd page, in the last sentence of the 1st paragraph "Female astronauts ………this increased vulnerability." Please provide a reference for the mentioned data.

    • The 2nd page, 2nd paragraph "Machine learning (ML) has been leveraged but to a much lesser extent (15)." Please revise the reference as Larrañaga et al. (2006) as ML's role in bioinformatics has been widely expanded since 2006.

    •  Page 6, 2nd paragraph: it was mentioned that "The support vector machine was created by Hava Siegelmann and Vladimir Vapnik" and the reference26 is  (Cortes & Vapnik, 1995) while this work was published in 2001 (Ben-Hur, D. Horn, H.T. Siegelmann, V. Vapnik, Support vector clustering, J. Mach. Learn. Res. 2 (Dec 2001) 125–137.) https://www.jmlr.org/papers/volume2/horn01a/horn01a.pdf 

    • Page 11, Pathway enrichment analysis: Please identify the abbreviation KEGG as Kyoto Encyclopedia of Genes and Genomes

    • Page 11, Pathway enrichment analysis: Please identify the abbreviation (FDR) as False Discovery Rate

    Concluding remarks

    We thank the authors of the preprint for posting their work openly for feedback. We also thank all participants of the Live Review call for their time and for engaging in the lively discussion that generated this review.

    • In the Data Transformation section groups were introduced for the first time in the manuscript (FLT vs GC and YNG vs OLD) -- these categories are defined later but it would be good to spell out the names the first time they are mentioned. That's true for any other acronym used.

    • The article did not introduce a limitation section. It is helpful to the reader to emphasize the limitations of the methods.

    Competing interests

    The authors declare that they have no competing interests.