Scale-sensitive Mouse Facial Expression Pipeline using a Surrogate Calibration Task

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Emotions are complex neuro-physiological states that influence behavior. While emotions have been instrumental to our survival, they are also closely associated with prevalent disorders such as depression and anxiety. The development of treatments for these disorders has relied on animal models, in particular, mice are often used in pre-clinical testing. To compare effects between treatment groups, researchers have increasingly used machine learning to help quantify behaviors associated with emotionality. Previous work has shown that computer vision can be used to detect facial expressions in mice. In this work, we create a novel dataset for depressive-like mouse facial expressions using varying LypoPolySaccharide (LPS) dosages and demonstrate that a machine learning model trained on this dataset was able to detect differences in magnitude via dosage amount.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/13162263.

    This review emerged from an in-person discussion during the Open Reviewers Workshop, facilitated by the PREreview Team at FSCI 2024 at UCLA. Three workshop participants and two facilitators, none of whom were domain experts, contributed to this review as the final activity after a three-day session on best practices in peer review. The preprint was chosen by the PREreview team for its brevity, accessibility, and in response to a feedback request from the preprint authors. We hope the authors and the broader community find this feedback helpful. 

    Summary 

    The study presents a machine-learning model designed to quantify emotion-related behaviors in mice through a novel dataset of depressive and anxiety-like facial expressions. Male and female mice were administered either saline (control) or various doses of Lipopolysaccharide (LPS) to induce controlled anxiety or depressive-like symptoms. The mice were filmed at different time points before and after injection while freely moving in a transparent view box.

    The methodology involved using a DeepLabCut model to extract facial landmarks from the videos, manual scoring and processing alignment, and a ResNet-based mean aggregation for prediction testing. The results demonstrated the model's success in distinguishing between groups administered with the vehicle and varying doses of LPS.

    While the study shows promise as an inexpensive and openly available method for quantifying emotional states in mice—crucial for preclinical testing of new treatments—it lacks sufficient details for a comprehensive evaluation of its accuracy and impact.

    Below, we summarize feedback from participants of the FSCI 2024 workshop, offering constructive suggestions for improving the manuscript.

    Major concerns and feedback

    1. Participants generally found the manuscript well-written and relatively easy to understand, even for a non-expert audience. The introduction was particularly effective in providing the context for the study. However, the manuscript lacks essential information necessary for readers to properly assess the study and for other researchers to reproduce and validate the findings.

      For example, the Methods section did not provide any information about the statistical analysis. The Results section mentions the usage of the post hoc Mann-Whitney U Tests to compare groups but then the table shown on page 4 reports results of a Kruskal Wallis test—which is presumably the correct test to use for multiple comparisons followed by a Bonferroni correction or any other kind of post hoc correction analysis for multiple comparison. The statistical approach should be detailed in the Methods section, with explicit reporting and justification of each statistical test used.

    2. Furthermore, the Methods section should contain information about the behavioral setup in which the mice's behavior was recorded. Figure 1 provides a picture of that set up but it's unclear what the components are, what task, if any, the mice were performing during the recordings, etc. In various parts of the manuscript, there is mention of the "surrogate training task" (in the Pipeline paragraph) or the "surrogate classification task" (in the Results section). It would be helpful to know more information about those tasks and how they were used. 

    3. In general, the figures and tables lack the necessary information to inform the reader about their purpose. Specifically, it would be helpful if the authors provided more context both in the text and in the captions to explain the main message conveyed by Figures 1 and 2. Additionally, Table 1 lacks a caption, and like Figure 1, it is not referenced in the text.

      For Figure 6, the y-axis reports the "Negative Affective Score," but this term is not mentioned in the text, and the caption does not provide enough information to understand the graph. Are the shades around the line representing SEM, SD, or something else? What analysis was used to produce this figure?

      Figure 7 lacks labeling and an explanation of the axis in the heatmap.

      The table on page 4 does not have a number or a caption and is not referenced in the text.

    4. The caption of Figure 7 mentions that Occlusion-based explainability techniques were used. It would be beneficial to explain these techniques in detail in the Methods section, rather than leaving the reader to figure it out on their own. 

    5. In line with previous comments regarding the general absence of necessary information, the discussion section is very brief and does not help the reader contextualize the study and its results within the broader literature. It would be beneficial to expand the discussion to include information on how this pipeline differs from others, how this model can be applied to other contexts, and how future studies can build upon this work. This will provide a more comprehensive understanding of the study's significance and potential impact.

    Minor concerns and feedback

    1. The discussion participants appreciated the adoption of a balanced control group with both male and female mice randomly assigned to each group on a daily basis. However, it would be helpful to know more about how that randomization was performed. Furthermore, there is no mention of a comparison between male and female mice. Was that not a variable in the multiple comparison analysis or was there no significant difference so it was not reported in the results?

    2. Some discussion participants wondered whether there were sufficient data to generate a robust prediction model. Mentioning what iN is generally required for this kind of experiment would help contextualize the current results.

    3. In the Methods section it is stated that "Filming each class followed best practice as it helped prevent models from using changing background information to make predictions." It is unclear what "best practices" the authors are referring to.

    4. Discussion participants really appreciated the detailed GitHub repository containing information about the source code for the data analysis. When looking at this page https://github.com/A-Telfer/mouse-facial-expressions-2023/blob/main/notebooks/01-video-processing.ipynb we noticed a small value error in the code that would presumably prevent someone from running the script. Additionally, this link https://github.com/A-Telfer/mouse-facial-expressions-2023/blob/main/notebooks/05-dataset-validations.ipynb leads to an "Invalid notebook" error message on GitHub. The usage of nbstrippout was suggested as a way to clear the notebook responses before committing to GitHub.

    5. The manuscript reports the data can be made available upon request from the primary author. However it is unclear from the GitHub repository if some data is already available to allow for the scripts to be run. For example, the discussion participants found a dataset_df pickle file and wondered what that was. It would be useful if the authors could say a bit more about what data is available, in what format and where and what kind of data can be expected upon request. Furthermore, if there is a way to make the data openly available under FAIR principles, that would significantly improve the accessibility of this study.

    6. To improve readability, it would be helpful to order the figures in the sequence they are referenced in the text. Currently, it is distracting to have to refer back to a figure mentioned later in the text but displayed and numbered earlier on the page. Reordering the figures for better alignment with the text will enhance the overall flow and clarity of the manuscript.

    7. The caption on Figure 5 is a bit confusing. The meaning of the parenthetical sentence "e.g. poor quality frames perhaps to activities such as grooming can be ignored without the need for further filtering steps" is unclear.

    8. In the Pipeline paragraph the "The surrogate training task selected was for the model to classify high-dose LPS from the saline controls at Preinjection and 4 hours Post Injection 1" ends with that "1" which links to an uncleared section in the paper. It would be helpful to add a reference to a Figure or a Table rather than the number "1" linking to unknown parts of the text.

    The discussion participants and facilitators appreciated the opportunity to review this preprint during the workshop. They commend the authors for sharing their work as a preprint and making their source code available, promoting transparency and collaboration in the research community.

    Competing interests

    VS and DS were the facilitators of the workshop in which this preprint was reviewed and they are both part of the PREreview team.