Using normative models pre-trained on cross-sectional data to evaluate intra-individual longitudinal changes in neuroimaging data
Curation statements for this article:-
Curated by eLife
eLife Assessment
This paper addresses an important topic (normative trajectory modelling), seeking to provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on a pre-trained population normative model. The evidence provided for the new methods is solid.
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (eLife)
Abstract
Longitudinal neuroimaging studies offer valuable insight into brain development, ageing, and disease progression over time. However, prevailing analytical approaches rooted in our understanding of population variation are primarily tailored for cross-sectional studies. To fully leverage the potential of longitudinal neuroimaging, we need methodologies that account for the complex interplay between population variation and individual dynamics. We extend the normative modelling framework, which evaluates an individual’s position relative to population standards, to assess an individual’s longitudinal change compared to the population’s standard dynamics. Using normative models pre-trained on over 58,000 individuals, we introduce a quantitative metric termed ‘ z-diff ’ score, which quantifies a temporal change in individuals compared to a population standard. This approach offers advantages in flexibility in dataset size and ease of implementation. We applied this framework to a longitudinal dataset of 98 patients with early-stage schizophrenia who underwent MRI examinations shortly after diagnosis and 1 year later. Compared to cross-sectional analyses, showing global thinning of grey matter at the first visit, our method revealed a significant normalisation of grey matter thickness in the frontal lobe over time—an effect undetected by traditional longitudinal methods. Overall, our framework presents a flexible and effective methodology for analysing longitudinal neuroimaging data, providing insights into the progression of a disease that would otherwise be missed when using more traditional approaches.
Article activity feed
-
-
-
-
eLife Assessment
This paper addresses an important topic (normative trajectory modelling), seeking to provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on a pre-trained population normative model. The evidence provided for the new methods is solid.
-
Reviewer #2 (Public review):
Summary:
In this manuscript, the authors provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on pre-trained population normative model (i.e., a Bayesian linear regression normative model), which was built based on cross-sectional data. This manuscript aims at solving a recently identified problem of using normative models based on cross-sectional data to make inferences about longitudinal change.
Strengths:
The efforts of this work make a good contribution to addressing an important question of normative modeling. With the greater availability of cross-sectional studies for normative modeling than longitudinal studies, and the inappropriateness of making inferences about longitudinal subject-specific …
Reviewer #2 (Public review):
Summary:
In this manuscript, the authors provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on pre-trained population normative model (i.e., a Bayesian linear regression normative model), which was built based on cross-sectional data. This manuscript aims at solving a recently identified problem of using normative models based on cross-sectional data to make inferences about longitudinal change.
Strengths:
The efforts of this work make a good contribution to addressing an important question of normative modeling. With the greater availability of cross-sectional studies for normative modeling than longitudinal studies, and the inappropriateness of making inferences about longitudinal subject-specific changes using these cross-sectional data-based normative models, it's meaningful to try to address this gap from the aspect of methodological development.
-
Author response:
The following is the authors’ response to the previous reviews.
Public Reviews:
Reviewer #2 (Public review):
Summary:
In this manuscript, the authors provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on pre-trained population normative model (i.e., a Bayesian linear regression normative model), which was built based on cross-sectional data. This manuscript aims at solving a recently identified problem of using normative models based on cross-sectional data to make inferences about longitudinal change.
Strengths:
The efforts of this work make a good contribution to addressing an important question of normative modeling. With the greater availability of cross-sectional studies for normative modeling than …
Author response:
The following is the authors’ response to the previous reviews.
Public Reviews:
Reviewer #2 (Public review):
Summary:
In this manuscript, the authors provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on pre-trained population normative model (i.e., a Bayesian linear regression normative model), which was built based on cross-sectional data. This manuscript aims at solving a recently identified problem of using normative models based on cross-sectional data to make inferences about longitudinal change.
Strengths:
The efforts of this work make a good contribution to addressing an important question of normative modeling. With the greater availability of cross-sectional studies for normative modeling than longitudinal studies, and the inappropriateness of making inferences about longitudinal subject-specific changes using these cross-sectional data-based normative models, it's meaningful to try to address this gap from the aspect of methodological development.
In the 1st revision, the authors added a simulation study to show how the performance of the classification based on z-diff scores relatively changes with different disruptions (and autocorrelation). Unfortunately, in my view this is insufficient as it only shows how the performance of using z-diff score relatively changes in different scenarios. I would suggest adding the comparison of performance to using the naïve difference in two simple z-scores to first show its better performance, which should also further highlight the inappropriate use of simple z-scores in inferring within-subject longitudinal changes.
Thank you for the suggestion for additional comparison, which we have now implemented in the simulated methods comparison, see Figure 2 and the extended text of Section 2.1.4 Simulation study.
Specifically, we have revised the simulation section to not only illustrate the performance of our z-diff method under various scenarios but also to include a direct comparison with a naïve approach that subtracts two z-scores.
The updated results demonstrate that, compared to the naïve method, the z-diff score consistently maintains a fixed false-positive rate, making it a more robust and controllable approach. Additionally, we show that under conditions of high autocorrelation, the z-diff method is significantly more sensitive in detecting smaller changes than the subtraction method. Importantly, our analysis of a sample from our dataset indicates that high autocorrelation is a prevalent characteristic in real-world data, further supporting the utility of the z-diff method.
We believe that these findings strengthen the case for adopting the z-diff method and underscore the limitations of more intuitive approaches, which, while simple, lack mathematical rigour.
Additionally, Figure 1 is hard to read and obtain the actual values of the performance measure. I would suggest reducing it to several 2-dimensional figures. For example, for several fixed values of rho, how the performance changes with different values of the true disruption (and also adding the comparison to the naïve method (difference in two z-scores)).
We believe that the Reviewer meant Figure 2; indeed, the 3-dimensional visualization, while attractive to some, may have been difficult to read, so we have now replaced it with several 2-dimensional figures as requested.
I would also suggest changing the title to reflect that the evaluation of "intra-subject" longitudinal change is the method's focus.
Thanks for the suggestion. We have now implemented it by changing the title to Using normative models pre-trained on cross-sectional data to evaluate intra-individual longitudinal changes in neuroimaging data.
We hope the changes implemented fulfill the expectations of the Reviewer.
-
-
-
eLife Assessment
This paper addresses an important topic (normative trajectory modelling), seeking to provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on a pre-trained population normative model. The evidence provided for the new methods is, however, incomplete, with the simulations validating the method needing to be extended.
-
Reviewer #1 (Public review):
Summary:
In this manuscript, the authors provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on pre-trained population normative model (i.e., a Bayesian linear regression normative model), which was built based on cross-sectional data. This manuscript aims at solving a recently identified problem of using normative models based on cross-sectional data to make inferences about longitudinal change.
Strengths:
The efforts of this work make a good contribution to addressing an important question of normative modeling. With the greater availability of cross-sectional studies for normative modeling than longitudinal studies, and the inappropriateness of making inferences about longitudinal subject-specific …
Reviewer #1 (Public review):
Summary:
In this manuscript, the authors provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on pre-trained population normative model (i.e., a Bayesian linear regression normative model), which was built based on cross-sectional data. This manuscript aims at solving a recently identified problem of using normative models based on cross-sectional data to make inferences about longitudinal change.
Strengths:
The efforts of this work make a good contribution to addressing an important question of normative modeling. With the greater availability of cross-sectional studies for normative modeling than longitudinal studies, and the inappropriateness of making inferences about longitudinal subject-specific changes using these cross-sectional data-based normative models, it's meaningful to try to address this gap from the aspect of methodological development.
In the 1st revision, the authors added a simulation study to show how the performance of the classification based on z-diff scores relatively changes with different disruptions (and autocorrelation). Unfortunately, in my view this is insufficient as it only shows how the performance of using z-diff score relatively changes in different scenarios. I would suggest adding the comparison of performance to using the naïve difference in two simple z-scores to first show its better performance, which should also further highlight the inappropriate use of simple z-scores in inferring within-subject longitudinal changes. Additionally, Figure 1 is hard to read and obtain the actual values of the performance measure. I would suggest reducing it to several 2-dimensional figures. For example, for several fixed values of rho, how the performance changes with different values of the true disruption (and also adding the comparison to the naïve method (difference in two z-scores)).
I would also suggest changing the title to reflect that the evaluation of "intra-subject" longitudinal change is the method's focus.
-
Author response:
The following is the authors’ response to the original reviews.
Reviewer #1 (Public Review):
The models described are not fundamentally novel, essentially a random intercept model (with a warping function), and some flexible covariate effects using splines (i.e., additive models).
We respectfully but strongly disagree with the reviewer’s assessment of the novelty of our work. The models referred to by the reviewer as “random intercept models … and some flexible covariate effects” seem to relate to the estimation of normative models derived cross-sectionally as developed in and adopted from previous work, not to the work presented here. To be clear, the contributions of this work are: (i) a principled methodology to make statistical predictions for individual subjects in longitudinal studies based on a novel z-diff …
Author response:
The following is the authors’ response to the original reviews.
Reviewer #1 (Public Review):
The models described are not fundamentally novel, essentially a random intercept model (with a warping function), and some flexible covariate effects using splines (i.e., additive models).
We respectfully but strongly disagree with the reviewer’s assessment of the novelty of our work. The models referred to by the reviewer as “random intercept models … and some flexible covariate effects” seem to relate to the estimation of normative models derived cross-sectionally as developed in and adopted from previous work, not to the work presented here. To be clear, the contributions of this work are: (i) a principled methodology to make statistical predictions for individual subjects in longitudinal studies based on a novel z-diff score, (ii) an approach to transfer information large scale normative models estimated on large scale cross-sectional data to longitudinal studies (iii) an extensive theoretical analysis of the properties of this approach and (iv) empirical evaluation on an unpublished psychosis dataset. Put simply, we provide the ability to estimate within subject change in normative models which until now only provide the ability to show a subject's position in the normative range at a given timepoint. With the exception of the reference [13] cited in the main text, we are not aware of any methods available that can achieve this. Based on this feedback combined with the feedback of the Reviewer 2, we now improved our introduction and clearly state our contribution right from the outset of the manuscript whilst also shortening the introduction to make it more concise. In this work, we are trying to be very transparent in showing to the reader that our method builds on a previously peer-reviewed model.
The assumption of constant quantiles is very strong, and limits the utility of the model to very short term data.
We now provide an extensive theoretical analysis of our approach (section 2.1.3), where we show that this assumption is actually not strictly necessary and that our approach yields valid inferences even under much milder assumptions. More specifically, we first provide a mathematical grounding for the assumption we made in the initial submission, then generalise our method to a wider class of residual processes and show that our original assumption of constant quantiles is not too restrictive. We also provide a simulation study to show how the practitioner can evaluate the validity and implications of this assumption on a case-by-case basis. This generalisation is described in depth in section 2.1.3.
The schizophrenia example leads to a counter-intuitive normalization of trajectories, which leads to suspicions that this is driven by some artifact of the data modeling/imaging pipelines.
We understand that the observed normalisation effects might appear surprising. As we outlined in our provisional response, we would like to emphasise that there is increasing evidence that the old neurodegenerative view of psychosis is an oversimplification and that trajectories of cortical thickness are highly variable across different individuals after the first psychotic episode. More specifically, we have shown in an independent sample and with different methodology that individuals treated with second-generation antipsychotics and with careful clinical follow-up can show normalisation of cortical thickness atypicalities after the first episode (https://www.medrxiv.org/content/10.1101/2024.04.19.24306008v2, now accepted in Schizophrenia Bulletin). These results are well-aligned with the results we show in this manuscript. We now added remarks on this topic into the discussion. We would also like to re-emphasise that the data were processed with the utmost rigour using state of the art processing pipelines including quality control, which we have reported as transparently as possible. The confidence that the results are not ‘driven by some artifact of the data modeling/imaging pipelines’ is also supported by the fact that analysis of a group of healthy controls did not show any significant z-diffs (see Discussion section), neither frontally nor elsewhere. If the reviewer believes there are additional quality control checks that would further increase confidence in our findings, we would welcome the reviewer to provide specific details.
The method also assumes that the cross-sectional data is from a "healthy population" without describing what this population is (there is certainly every chance of ascertainment bias in large scale studies as well as small scale studies). This issue is completely elided over in the manuscript.
Indeed, we do not describe the cross-sectional population used for training the models, as these models were already trained and published with in-depth description of the datasets used for the training (https://elifesciences.org/articles/72904). We now make this more explicit in the section 2.1.1. of the manuscript (page 7), and also more explicitly acknowledge the possibility of ascertainment bias in the simulation section 2.1.4. However, we would like to emphasise that such ascertainment bias is not in any way specific to the analyses we report. In fact it is present in all studies that utilise large scale cohorts such as UK Biobank. Indeed, we are currently working on another manuscript to address this question in detail, but given the complexity of this problem and the fact that many publicly available legacy studies simply do not record sufficient demographic information, e.g. to assess racial bias properly, we believe that this is beyond the scope of the current work.
Reviewer #2 (Public Review):
The organization and clarity of this manuscript need enhancement for better comprehension and flow. For example, in the first few paragraphs of the introduction, the wording is quite vague. A lot of information was scattered and repeated in the latter part of the introduction, and the actual challenges/motivation of this work were not introduced until the 5th paragraph.
As noted above in our response to Reviewer 1, we significantly pruned the introduction, stating our objective in the first paragraph and elaborating on the topic later in the text. We hope that it is now less repetitive and easier to follow.
There are no simulation studies to evaluate whether the adjustment of the crosssectional normative model to longitudinal data can make accurate estimations and inferences regarding the longitudinal changes. Also, there are some assumptions involved in the modeling procedure, for example, the deviation of a healthy control from the population over time is purely caused by noise and constant variability of error/noise across x_n, and these seem to be quite strong assumptions. The presentation of this work's method development would be strengthened if the authors can conduct a formal simulation study to evaluate the method's performance when such assumptions are violated, and, ideally, propose some methods to check these assumptions before performing the analyses.
This comment encouraged us to zoom out from our original assumption and generalise our method to a wider class of residual processes (stationary Gaussian processes) in section 2.1.3. We now present a theoretical analysis of our model to show that our original assumption (of stable quantiles plus noise) is actually not necessary for valid inference in our method, which broadens the applicability of our method. Of course, we also discuss in what way the original assumption is restrictive and how it aligns with the more general dynamics. We also include a simulation study to evaluate the method's performance and elucidate the role of the more general dynamics in section 2.1.4.
The proposed "z-diff score" still falls in the common form of z-score to describe the individual deviation from the population/reference level, but now is just specifically used to quantify the deviation of individual temporal change from the population level. The authors need to further highlight the difference between the "z-score" and "z-diff score", ideally at its first mention, in case readers get confused (I was confused at first until I reached the latter part of the manuscript). The z-score can also be called a measure of "standardized difference" which kind of collides with what "z-diff" implies by its name.
We added the mention of the difference between z-score and z-diff score into the last paragraph of introduction.
Explaining that one component of the variance is related to the estimation of the model and the other is due to prediction would be helpful for non-statistical readers.
We now added an interpretation of the z-score in the original model below equation 7.
It would be easier for the non-statistical reader if the authors consistently used precision or variance for all variance parameters. Probably variance would be more accessible.
This was a very useful observation, we unified the notation and now only use variance.
The functions psi were never explicitly described. This would be helpful to have in the supplement with a reference to that in the paper.
Indeed, while describing the original model we had to make choices about how to condense the necessary information from the original model so that we can build upon it. As the phi function is only used for data transformation in the original model, we did not further elaborate on it, however, we now refer to the specific section of the original paper of Fraza et al. 2021 where it is described more in detail (https://www.sciencedirect.com/science/article/pii/S1053811921009873).
What is the goal of equations (13) and (14)? The authors should clarify what the point of writing these equations is prior to showing the math. It seems like it is to obtain an estimate of \sigma_{\ksi}^2, which the reader only learns at the end.
We corrected the formatting.
What is the definition of "adaption" as used to describe equation (15)? In this equation, I think norm on subsample was not defined.
We added a more detailed description of the adaptation after equation 15.
"(the sandwich part with A)" - maybe call this an inner product so that it is not confused with a sandwich variance estimator. This is a bit unclear. Equation (8) does have the inner product involving A and \beta^{-1} does include variability of \eta. It seems like you mean that equation (8) incorrectly includes variability of \eta and does not have the right term vector component of the inner product involving A, but this needs clarifying.
We now changed the formulation to be less confusing and also explicitly clarified the caveat regarding the difference of z-scores.
One challenge with the z-diff score is that it does not account for whether a person sits above or below zero at the first time point. It might make it difficult to interpret the results, as the results for a particular pathology could change depending on what stage of the lifespan a person is in. I am not sure how the authors would address those challenges.
We agree with the outlined limitation in interpretation of overall trends when the position in the visit one is different between the subjects. However, this is a much broader challenge and is not specific to our approach. This effect is generally independent of the lifespan, but may further interact with the typical lifespan of disease. rWhen the z scores are taken in the context of the cross-sectional normative models, it does make it possible to identify what the overall trend of an illness is across the lifespan, and individual patient’s z-diffs not in line (with what would this typical group trajectory predicts) may e.g. correspond to early/late onset of their individual atrophy. We now make these considerations explicitly in the discussion section.
Reviewer #2 (Recommendations For The Authors):
Other minor suggestions to help improve the text:...
We thank Reviewer #2 for the list of minor suggestions to improve the text, which we all implemented in the manuscript.
-
-
-
-
Author response:
We thank the reviewers for the feedback on our manuscript; we are planning to address the raised concerns in the following manner:
We will be more explicit about the novelty of this method framing it more concretely within the scope of current research. From some comments of the reviewers, we understand that it is not clear that our method is an extension of an already existing method and model that has been extensively validated with pre-trained models brought online. Consequently, the details of the model as well as the training cohort are only covered briefly, referencing relevant published works on this topic. We will improve the clarity in this respect in the full responses. Nevertheless, we agree that the work would benefit from a simulation study that formally evaluates the performance of our method compared with …
Author response:
We thank the reviewers for the feedback on our manuscript; we are planning to address the raised concerns in the following manner:
We will be more explicit about the novelty of this method framing it more concretely within the scope of current research. From some comments of the reviewers, we understand that it is not clear that our method is an extension of an already existing method and model that has been extensively validated with pre-trained models brought online. Consequently, the details of the model as well as the training cohort are only covered briefly, referencing relevant published works on this topic. We will improve the clarity in this respect in the full responses. Nevertheless, we agree that the work would benefit from a simulation study that formally evaluates the performance of our method compared with more traditional approaches and will add it in our full responses. We will take care specifically of investigating the effect of assumptions like the centile-stability in healthy controls as suggested by the Reviewer 2.
The novelty of this work lies in introducing a mathematically transparent method to use normative modelling for evaluating studies with a longitudinal design, using normative models trained on cross sectional data. We emphasise strongly that this is otherwise not possible using current methods. Furthermore, by building on a pre-trained model, this method enjoys the benefits of big (cross-sectional) data (by the pre-trained model being fitted on an extensive population sample) without the need to have direct access to them, or a ‘big’ longitudinal dataset from the cohort at hand. This is crucial in neuroimaging, where longitudinal data are much more scarce than cross-sectional data.
We strongly disagree with the notion raised by Reviewer 1 that after the first episode cortical thickness alterations are expected to become more severe. There is now increasing evidence that: (i) trajectories of cortical thickness are highly variable across different individuals after the first psychotic episode and (ii) that individuals treated with second-generation antipsychotics and with careful clinical follow-up can show normalisation of cortical thickness atypicalities after the first episode. Indeed, we can provide evidence for this in an independent cohort, with different analytical methodologies, where precisely this occurs (https://www.medrxiv.org/content/10.1101/2024.04.19.24306008v1, https://pubmed.ncbi.nlm.nih.gov/36805840/). In the full revision, we would be happy to provide further discussion of evidence in support of this.
We would also like to re-emphasise that the data were processed with the utmost rigour using state of the art processing pipelines including quality control.
We will take care to improve the flow of the manuscript with special attention to the theoretical part and sections highlighted by the Reviewer 2.
We agree with the challenge outlined by the Reviewer 2 regarding the limitations in interpretation of overall trends when the position in the visit one is different between the subjects. However, this is a much broader challenge and is not specific to this study. The non-random sampling of large cohort studies is problematic for nearly all studies using such cohorts, and regardless of the statistical approach used. We will explicitly acknowledge these limitations in the full response.
-
eLife assessment
This paper addresses an important topic (normative trajectory modelling), seeking to provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on a pre-trained population normative model. The evidence provided for the new methods is, however, inadequate. There is a lack of simulation studies to formally evaluate the performance of the proposed method in making accurate estimations and inferences about the longitudinal changes, the novelty of the method is not sufficiently described, and the example provided is unsatisfactory.
-
**Reviewer #1 (Public Review):
**
Summary:This paper provides a methodology for normative trajectory modeling, using cross-sectional data to set the "norms," and then applying these norms to longitudinal brain observations. An example of schizophrenia trajectories (two time points) is provided. The method assumes a Bayesian mixed effects model, which included some hyperparameters that need to be tuned. The longitudinal assumption is essentially a random intercept model, assuming that the age-based quantiles do not shift, and if they do that is a sign of disease-like trajectories.
Strengths:
Normative modeling of brain feature trajectories is an important topic. Bayesian models are a promising alternative to modeling these. Leveraging large-scale data to provide norms is also potentially useful.
Weaknesses:
The models described are not …
**Reviewer #1 (Public Review):
**
Summary:This paper provides a methodology for normative trajectory modeling, using cross-sectional data to set the "norms," and then applying these norms to longitudinal brain observations. An example of schizophrenia trajectories (two time points) is provided. The method assumes a Bayesian mixed effects model, which included some hyperparameters that need to be tuned. The longitudinal assumption is essentially a random intercept model, assuming that the age-based quantiles do not shift, and if they do that is a sign of disease-like trajectories.
Strengths:
Normative modeling of brain feature trajectories is an important topic. Bayesian models are a promising alternative to modeling these. Leveraging large-scale data to provide norms is also potentially useful.
Weaknesses:
The models described are not fundamentally novel, essentially a random intercept model (with a warping function), and some flexible covariate effects using splines (i.e., additive models). The assumption of constant quantiles is very strong, and limits the utility of the model to very short term data. The schizophrenia example leads to a counter-intuitive normalization of trajectories, which leads to suspicions that this is driven by some artifact of the data modeling/imaging pipelines. The method also assumes that the cross-sectional data is from a "healthy population" without describing what this population is (there is certainly every chance of ascertainment bias in large scale studies as well as small scale studies). This issue is completely elided over in the manuscript.
-
Reviewer #2 (Public Review):
Summary:
In this manuscript, the authors provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on pre-trained population normative model (i.e., a Bayesian linear regression normative model), which was built based on cross-sectional data. This manuscript aims to solve a recently identified problem of using normative models based on cross-sectional data to make inferences about longitudinal change.
Although the proposed method was implemented with real data and shown to be more sensitive in capturing the differences confirmed by previous studies than conventional methods, there is still a lack of simulation studies to formally evaluate the performance of the proposed method in making accurate estimations …
Reviewer #2 (Public Review):
Summary:
In this manuscript, the authors provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on pre-trained population normative model (i.e., a Bayesian linear regression normative model), which was built based on cross-sectional data. This manuscript aims to solve a recently identified problem of using normative models based on cross-sectional data to make inferences about longitudinal change.
Although the proposed method was implemented with real data and shown to be more sensitive in capturing the differences confirmed by previous studies than conventional methods, there is still a lack of simulation studies to formally evaluate the performance of the proposed method in making accurate estimations and inferences about the longitudinal changes.
Strengths:
The efforts of this work make a good contribution to addressing an important question of normative modeling. With the greater availability of cross-sectional studies for normative modeling than longitudinal studies, and the inappropriateness of making inferences about longitudinal subject-specific changes using these cross-sectional data-based normative models, it's meaningful to try to address this gap from the perspective of methodological development.
Weaknesses:
• The organization and clarity of this manuscript need enhancement for better comprehension and flow. For example, in the first few paragraphs of the introduction, the wording is quite vague. A lot of information was scattered and repeated in the latter part of the introduction, and the actual challenges/motivation of this work were not introduced until the 5th paragraph.
• There are no simulation studies to evaluate whether the adjustment of the cross-sectional normative model to longitudinal data can make accurate estimations and inferences regarding the longitudinal changes. Also, there are some assumptions involved in the modeling procedure, for example, the deviation of a healthy control from the population over time is purely caused by noise and constant variability of error/noise across x_n, and these seem to be quite strong assumptions. The presentation of this work's method development would be strengthened if the authors can conduct a formal simulation study to evaluate the method's performance when such assumptions are violated, and, ideally, propose some methods to check these assumptions before performing the analyses.
• The proposed "z-diff score" still falls in the common form of z-score to describe the individual deviation from the population/reference level, but now is just specifically used to quantify the deviation of individual temporal change from the population level. The authors need to further highlight the difference between the "z-score" and "z-diff score", ideally at its first mention, in case readers get confused (I was confused at first until I reached the latter part of the manuscript). The z-score can also be called a measure of "standardized difference" which kind of collides with what "z-diff" implies by its name.
• Explaining that one component of the variance is related to the estimation of the model and the other is due to prediction would be helpful for non-statistical readers.
• It would be easier for the non-statistical reader if the authors consistently used precision or variance for all variance parameters. Probably variance would be more accessible.
• The functions psi were never explicitly described. This would be helpful to have in the supplement with a reference to that in the paper.
• What is the goal of equations (13) and (14)? The authors should clarify what the point of writing these equations is prior to showing the math. It seems like it is to obtain an estimate of \sigma_{\ksi}^2, which the reader only learns at the end.
• What is the definition of "adaption" as used to describe equation (15)? In this equation, I think norm on subsample was not defined.
• "(the sandwich part with A)" - maybe call this an inner product so that it is not confused with a sandwich variance estimator. This is a bit unclear. Equation (8) does have the inner product involving A and \beta^{-1} does include variability of \eta. It seems like you mean that equation (8) incorrectly includes variability of \eta and does not have the right term vector component of the inner product involving A, but this needs clarifying.
• One challenge with the z-diff score is that it does not account for whether a person sits above or below zero at the first time point. It might make it difficult to interpret the results, as the results for a particular pathology could change depending on what stage of the lifespan a person is in. I am not sure how the authors would address those challenges.
-
-
-