Predicting the retinotopic organization of human visual cortex from anatomy using geometric deep learning

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

No abstract available

Article activity feed

  1. ###Reviewer #3:

    This paper presents a neural network based approach to predict the retinotopic organization of the human visual cortex from structural MRI data. The authors are promoting the use of non-Euclidean/geometric deep learning methods for this problem. They apply their technique to the HCP data and show some interesting results, which they claim demonstrates that functional organization in the visual system can be predicted at the individual level. For me, the paper has several substantial and important flaws.

    First, one of the most important contributions of the paper is the promotion of geometric deep learning. To me, the value of this framework has not been demonstrated with the experiments. In order to assess the additional boost afforded by geometric techniques, one would need to establish a baseline with a Euclidean model. Without this comparison, it is impossible to evaluate the value of this innovation.

    Second, in general, I did not find the quality of the individual-level predictions and the presented quantitative results convincing or impressive. In Figure 3, for example, I'd like to see the underlying sulcal geometry (of each subject) to assess the value of the presented "individualized" predictions. Also, the quality of the predictions, as the authors acknowledge, is significantly reduced in large parts of the cortex, including higher order areas. Importantly, though, it is not clear how much of the individual variability is truly captured in these predictions. For example, the error maps in Figure 6 for the "shuffled" and "constant" cases look very similar to the actual error maps. And quantitatively, the overall error values are very close for these cases. This suggests that the predicted retinotopic maps are not much better than a simple group average retinotopic map. One way to counter this concern would be to conduct a fingerprinting/identifiability experiment and demonstrate that the predicted maps are much closer to the observed/measured/estimated (ground truth) maps for the same individual than other individuals. Without such an analysis, it is impossible to assess how much of individual variation is captured.

    The proposed smooth L1 loss was not properly justified and seems inappropriate. The threshold of 1 seems arbitrary. In fact, the cyclical nature of polar angle should require a cyclical loss function. However, this is a minor concern.

    The need for dropout was not also demonstrated. Was there a concern of overfitting? Showing learning curves (for training and validation data) would help with that.

    Choosing the best model based on validation loss can be improved with a "deep ensemble" strategy.

    In the shuffling procedure, spatial correlation structure seems to have been destroyed. A better approach would be to randomly deform/rotate the structural image.

    Setting the structural data to zero at input and assessing test time performance makes no sense and provides no real value.

    I suggest that authors make their code available during peer review too. Otherwise, it is impossible to assess the reproducibility of their work.

    Finally, I believe 10 is too small for the test dataset. A widely accepted convention is to use at least 10% of the total dataset for testing. I would recommend using 20 or 30 subjects for testing.

  2. ###Reviewer #2:

    The authors use deep learning to map brain anatomy (cortical curvature and myelination) to retinotopic maps (eccentricity and polar angle) in individual subjects.

    My overall assessment of this work is that, although the idea is neat, the execution seems a bit rushed and lacks somewhat in depth of analysis.

    More specifically:

    1. This is my main concern: The evaluation of the method's ability to find fine-grained individual differences is somewhat anecdotal and not strongly backed by rigorous analyses.

    -The idiosyncratic differences shown in Fig4a are intriguing but they could also simply be explained by gross differences in the gyral patterns of these subjects.

    -The differences between the predictions of different subjects is much lower than the within-subject prediction errors.

    -The authors should make these evaluations more quantitative. For example, by delineating several visual areas in the empirical datasets and predicted maps (in a blinded manner) and checking to see if the sizes of the different visual areas are well predicted at an individual level. This could even be built up in the model as a classifier for different visual areas.

    -Using shuffled features as some sort of null is not appropriate in my opinion, as that breaks the statistics of the input. In fact, I am amazed that it has any predictive power at all, which it clearly does seeing that the prediction errors are similar to the empirical data (Fig 6). Why is that? Is it the case e.g. that the model learns the relation between where the edges of the visual areas mask is and the retinotopy map? What happens if you give the model a mask as input that is completely different (e.g. arbitrarily expanded or contracted). My guess is that the predictions will be vastly different and distorted.

    1. It is really unclear what the approach achieves beyond finding the border between primary regions V1,V2, and V3.

    -The authors should consider delineating more areas in the empirical data and showing that their predictions cover the full 0-360 and 0-12deg range in both dimensions. This analysis would greatly inform the individual variations mentioned above.

    -One interesting suggestion by the authors is that dorsal areas in the IPS actually have bad empirical retinotopy data (indeed these areas might need specialised tasks, e.g. involving attentional components, i.e. attending to parts of the visual field [see Sereno et al.]). In fact the empirical data seem to predict that these regions cover a different hemifield in the shown test subjects, which is not what is expected. It would be interesting to see if the model proposed here does indeed predict, e.g. polar angle reversals in IPS1,2,3.. (I can see a hint of it in Fig3). To me, even without empirical data to compare to, this would be a strong suggestion that the authors may be capturing some genuine structure-function relations.

    1. Some discussion around the modelling/quantification is lacking:

    -Errors in the polar angles are really high (~30deg even in V1).

    -Related to a sub-point in comment (1): why does shuffling work? Can the authors show the actual predictions of the shuffled data (as opposed to the errors) - do they look like retinotopy maps?

    -Do we need deep learning? Previous work has shown simple relations between V1,V2,V3 and the geometry of the brain. Does this model actually capture more fine-grained features?

    -I would have set it up as a regression against x,y coords in the visual field rather than polar coords (which have obvious wrap-around problems). This will avoid the use of tricks like rotating the visual field before training, as the authors did.

    -The obvious deep-learning question: learning such a highly parameterised model based on 180*2 hemispheres sound hard. What evidence is there that this is not overfitting?

    -The authors mention in the methods that the 3D coordinates were also used as features, but in their Fig2 It looks like the features are only curvature+myelin: which is it? Are 3D coords used as explicit features?

    1. Show the data:

    -It would be good to see the features going into these predictions and the relationship with the targets. Maybe even scatter plots of curvature/myelin vs polar angle?

    -Subjects shown (test set) have very noisy maps outside the early visual cortex. Where are they in the subject-distribution of variance explained? (Benson J Vis 2018).

  3. ###Reviewer #1:

    The manuscript by Ribeiro, Bollmann and Puckett uses machine-learning to predict, across individuals, the retinotopic mapping from the cortical myeline and curvature map. The authors use a sophisticated method (convolutional network on the graph, called here geometric deep learning) and show appealing predicted maps of individual retinotopy in V1. While the work is interesting, the quality of the result is disappointing and the positioning to the literature imprecise.

    The authors claim that their model is "able to predict retinotopic organization far beyond early visual cortex, throughout the visual hierarchy". However the figures do not seem to support this claim: the qualitative figures do not show a clear structure in the higher-level regions.

    Figure 1 is appealing, however it should be compared to a simple average of all retinotopic maps. Likewise, the quantitative results in supplementary table 2 do not come with a comparison to the mean predictor (as with an R2 score), and it is not possible to judge whether these numbers are a good performance or not.

    Rather, figure 6 shows that the models trained on shuffled and constant data perform qualitatively and quantitatively well. The proposed model does perform slightly better, but the statistical and practical significance of this improvement is unclear. The manuscript makes no clear attempt at judging the statistical significance, and the small number of participants in the test set (10), makes it unlikely that significance would be attained. It would be beneficial to perform a complementary analysis on a larger cohort, for instance using the 3T HCP data, at the cost of lower-quality data.

    There have been many prior works that have shown the ability to predict functional organization from other mapping information. In this respect, the positioning of the present manuscript with regards to the literature is very unclear. The manuscript does acknowledge some prior work, including work using template warping, but claims that they have not "been able to capture the detailed idiosyncrasies seen in the actual measured maps of those individuals". However, no precise argument is brought forward: no quantitative measure can be compared with the prior publication, no comparison is performed. Also, individual task functional topography has been inferred from other information such as anatomical connectivity [Saygin 2012], resting-state activity [Tavor 2016], or movie watching [Eickenberg 2017]. A discussion of the relative accuracy, or pros and cons would have been interesting here.

    With this in mind, the title feel much too general: "Predicting brain function from anatomy using geometric deep learning"

    As a minor comment: controlling for the twin structure could be done in a more powerful way by isolating siblings in each of the train, validation, and test set so that there is one pair separated across sets.

    [Saygin 2012] Saygin, Zeynep M., et al. "Anatomical connectivity patterns predict face selectivity in the fusiform gyrus." Nature neuroscience (2012)

    [Tavor 2016] Tavor, I., et al. "Task-free MRI predicts individual differences in brain activity during task performance." Science 2016

    [Eickenberg 2017] Eickenberg, Michael, et al. "Seeing it all: Convolutional network layers map the function of the human visual system." NeuroImage (2017)

  4. ##Preprint Review

    This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript. Gaël Varoquaux (INRIA) served as the Reviewing Editor.

    ###Summary:

    The reviewers all expressed interest in the research agenda as well as the methods. However, it was felt that the results did not demonstrate a clear and sufficient improvement with regards to prior art. On the methodological side, the benefit of the deep-learning formulation was not clearly revealed. On the neuroscience side, the evidence that the method captures fine inter-individual differences was felt insufficient.