Population modeling with machine learning can enhance measures of mental health
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (GigaScience)
Abstract
Background
Biological aging is revealed by physical measures, e . g ., DNA probes or brain scans. Instead, individual differences in mental function are explained by psychological constructs, e.g., intelligence or neuroticism. These constructs are typically assessed by tailored neuropsychological tests that build on expert judgement and require careful interpretation. Could machine learning on large samples from the general population be used to build proxy measures of these constructs that do not require human intervention?
Results
Here, we built proxy measures by applying machine learning on multimodal MR images and rich sociodemographic information from the largest biomedical cohort to date: the UK Biobank. Objective model comparisons revealed that all proxies captured the target constructs and were as useful, and sometimes more useful than the original measures for characterizing real-world health behavior (sleep, exercise, tobacco, alcohol consumption). We observed this complementarity of proxy measures and original measures when modeling from brain signals or sociodemographic data, capturing multiple health-related constructs.
Conclusions
Population modeling with machine learning can derive measures of mental health from brain signals and questionnaire data, which may complement or even substitute for psychometric assessments in clinical populations.
Key Points
-
We applied machine learning on more than 10.000 individuals from the general population to define empirical approximations of health-related psychological measures that do not require human judgment.
-
We found that machine-learning enriched the given psychological measures via approximation from brain and sociodemographic data: Resulting proxy measures related as well or better to real-world health behavior than the original measures.
-
Model comparisons showed that sociodemographic information contributed most to characterizing psychological traits beyond aging.
Article activity feed
-
Background
**Reviewer 2. Hugo Schnack ** This manuscript reports on the results of a study that can be split into two parts. For this, it should be noted that the authors consider three categories of quantities. The first category are the input data, or 'predictors': (a) variables derived from MRI scans and (b) rich sociodemographic variables. The second category, or 'target variables', as the authors call them, include: (a) age, (b) fluid intelligence and (c) neuroticism. In the first part of the study, using machine learning, predictive models are built to predict the target variables from the input variables. The resulting predictions are called 'proxy measures'. For the second stage, a third category of variables is included, the 'real world health behaviours', such as alcohol use and physical activity. The authors now set out to …
Background
**Reviewer 2. Hugo Schnack ** This manuscript reports on the results of a study that can be split into two parts. For this, it should be noted that the authors consider three categories of quantities. The first category are the input data, or 'predictors': (a) variables derived from MRI scans and (b) rich sociodemographic variables. The second category, or 'target variables', as the authors call them, include: (a) age, (b) fluid intelligence and (c) neuroticism. In the first part of the study, using machine learning, predictive models are built to predict the target variables from the input variables. The resulting predictions are called 'proxy measures'. For the second stage, a third category of variables is included, the 'real world health behaviours', such as alcohol use and physical activity. The authors now set out to predict these measures of behaviour based on the measures of the second category, either the 'real ones' or the 'proxies'. Thus, the question is, can alcohol use be better predicted by neuroticism determined from a questionnaire, or by the neuroticism proxy derived from MRI and sociodemographics? The main results are presented in Figure 2, and the conclusion made by the authors is that the proxies perform better than the real measures.The authors carry out additional analyses, including the study of the relative importance of MRI and sociodemographics. The authors suggest that these proxies may have clinical use in the future. At first sight it may seem surprising that proxies perform better then the real measure in capturing the associations, but, as the authors mention, the real measures suffer from (measurement) noise and non-objectivity. However, the proxies are biased (in the sense of being to simple) and are thus less capable of modeling the (true) individual variation. I would have expected a more in depth discussion about this. Apart from this, there is an asymmetry in the way age is treated as compared to the other two target variables, intelligence and neuroticism. Age is a very hard measure, without any measurement error, and independent of the brain. The other two targets, intelligence and neuroticism, are softer measures, and directly related to the brain. How does this influence the analyses and the results? Indeed, not 'predicted age' is used as proxy, but 'brain age delta'. I would have liked to see more explanation and discussion about this. Finally, the suggested clinical use of the proxies is not supported well enough in my opinion. Maybe the authors could add more this discussion to this point as well. All in all, this is a scientifically interesting study, but I think the presentation could be improved, by more clearly stating the aims of it, and by giving more insight in certain aspects of the 'proxy modeling'.
-
Abstract
This paper has been published in GigaScience, where the peer reviews are published openly under a CC-BY 4.0 open license.
**Reviewer 1.Bo Cao ** Reviewer Comments to Author: The manuscript describes an application of Machine Learning (ML) models for the quantification of psychological constructs, e.g. fluid intelligence and neuroticism, using multi-mode MRI data from a large population cohort, the UK biobank data. They show that the proxy measures of these psychological constructs are more useful compared to the original constructs for characterizing health behaviors. Overall, the manuscript is well written. The research questions are clearly stated and are of practical importance. However, the reviewer has following concerns.
Major Concerns:
- In page 3 (left, lines 3-6 of the main text), the author claims that "Our findings …
Abstract
This paper has been published in GigaScience, where the peer reviews are published openly under a CC-BY 4.0 open license.
**Reviewer 1.Bo Cao ** Reviewer Comments to Author: The manuscript describes an application of Machine Learning (ML) models for the quantification of psychological constructs, e.g. fluid intelligence and neuroticism, using multi-mode MRI data from a large population cohort, the UK biobank data. They show that the proxy measures of these psychological constructs are more useful compared to the original constructs for characterizing health behaviors. Overall, the manuscript is well written. The research questions are clearly stated and are of practical importance. However, the reviewer has following concerns.
Major Concerns:
- In page 3 (left, lines 3-6 of the main text), the author claims that "Our findings suggested that psychological constructs can be approximated from brain images and sociodemographic variables - inputs not tailored to specifically measure these constructs.". The reviewer has concerns about this claim. Although Figure 3 shows the model's performance in predicting age, fluid intelligence and neuroticism using neuroimaging data and different areas of sociodemographic data, the performance of the models in predicting the psychological constructs, fluid intelligences and neuroticism, may not be good enough to support such a claim.
- In Figure 2, the proxy measure and original measure show similar associations with the health phenotypes for fluid intelligence (center plot) and neuroticism (right plot), but not for the brain age delta. The main reason seems to be when doing the association analysis, the measures of the health phenotypes are de-confounded for their dependence for age (In the subsection "Out-of-sample association between proxy measures and health-related habits" of the "statistical analysis" section). However, it seems the same procedure is not applied for the association analysis of fluid intelligence and neuroticism. The estimated brain age or brain age gap depends on the age. Thus, we need to either correct the brain age or brain age gap for its dependence on the age, or de-confounded the health phenotype's dependence on age. If the author wants to derive the proxy measure of the psychological construct in the same as the brain age (or biological age), same procedure should be used to correct the proxy measure's dependence on the original measure.
- Based on Figure 2, the author claims that the proxy measures have enhanced association with health behavior compared to the original measures. If we only focus on the central and right part of the Figure 2, the difference is not that obvious. We do not know if the difference is significant or not. A better approach maybe is that correct the predicted fluid intelligence and predicted fluid intelligence for their dependence on the original measures or de-confounded the original measures' effects on the health behaviors.
Minor concerns:
In page 1 (two lines before reference 15), it seems that "to learn" is mis-spelled into "tolearn".
The author stated that there are repeated measures for subjects in UK biobank data. How the author tackles this issue in their data preprocessing? Using the last one or the first one or something else?
The selection 5,587 out of all the 10,975 subjects for the modeling, while the left part is for the out-of-sample association analysis. The selection seems arbitrary. Can the author also show a learning curve, in which x is the sample size and y is the model's performance, to justify their choice is enough to train an accurate ML model?
In the first paragraph of the "Methods" section, there are duplications.
In the subsection of "Data acquisition" part, under the "target measures" paragraph, the age at the baseline recruitment is used as the outcome. However, in general, there is a gap between the age at baseline and the age when the MRI images were acquired. Does this matter for the data analysis in this manuscript.
For the classification analysis (paragraph "Classification analysis" in the subsection of "Comparing predictive models to approximate target measures", and the paragraph above the "Discussion" section), the thresholds selected to discretize the outcome variables are kind of arbitrary.
Comments on Re-Review: The substantial revision improved the paper and is appreciated by the reviewer. The details have been enhanced. However, the reviewer still has some concerns about the basic logic and its presentation of the paper after reviewing all the comments from other reviewers and the feedback from the author. Figure 1 is helpful (BTW, the font is too small and smaller than other figures). But if we consider the current approach again, when the machine learning (ML) has perfect performance to generate the so called "proxy measures", these measures should match exactly each individual's age, fluid-intelligence and neuroticism. What the author claimed about proxy measures providing better assessment to other health related variables might be simply due to the imperfectness or the "residuals" from ML prediction to the real targets (age, fluid-intelligence and neuroticism). The author may need to address this and present the logic of the paper in a clearer way to help the readers understand the main point and results of the paper. In this regard, Figure 1 is incomplete in addressing the full flow of the paper, which is necessary for such a seemingly complex paper in the reviewer's opinion.
-
-