Interpreting the molecular mechanisms of disease variants in human transmembrane proteins

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article


Next-generation sequencing of human genomes reveals millions of missense variants, some of which may lead to loss of protein function and ultimately disease. We here investigate missense variants in membrane proteins — key drivers in cell signaling and recognition. We find enrichment of pathogenic variants in the transmembrane region across 19,000 functionally classified variants in human membrane proteins. To accurately predict variant consequences, one fundamentally needs to understand the reasons for pathogenicity. A key mechanism underlying pathogenicity in missense variants of soluble proteins has been shown to be loss of stability. Membrane proteins though are widely understudied. We here interpret for the first time on a larger scale variant effects by performing structure-based estimations of changes in thermodynamic stability under the usage of a membrane-specific force-field and evolutionary conservation analyses of 15 transmembrane proteins. We find evidence for loss of stability being the cause of pathogenicity in more than half of the pathogenic variants, indicating that this is a driving factor also in membrane-protein-associated diseases. Our findings show how computational tools aid in gaining mechanistic insights into variant consequences for membrane proteins. To enable broader analyses of disease-related and population variants, we include variant mappings for the entire human proteome.


Genome sequencing is revealing thousands of variants in each individual, some of which may increase disease risks. In soluble proteins, stability calculations have successfully been used to identify variants that are likely pathogenic due to loss of protein stability and subsequent degradation. This knowledge opens up potential treatment avenues. Membrane proteins form about 25% of the human proteome and are key to cellular function, however calculations for disease-associated variants have not systematically been tested on them. Here we present a new protocol for stability calculations on membrane proteins under the usage of a membrane specific force-field and its proof-of-principle application on 15 proteins with disease-associated variants. We integrate stability calculations with evolutionary sequence analysis, allowing us to separate variants where loss of stability is the most likely mechanism from those where other protein properties such as ligand binding are affected.

Article activity feed

  1. Consolidated peer review report (23 September 2022)


    In this manuscript, Tiemann, J., et al. take on a large-scale exploration of how mutations associated with disease impact calculated stability and conservation scores across the entire membrane proteome. The aim was to gain mechanistic insight into the causes of pathogenicity of missense mutations of human membrane proteins and verify whether, as is the case for soluble proteins, mutational destabilisation of membrane proteins can explain disease. To do so, the authors use a framework they previously developed, using measures of stability change (ΔΔG) and sequence conservation (ΔΔE, the GEMME score) to predict fitness effects of mutations with large-scale mutational data (Høie et al., 2022).

    By conducting a proteome-wide analysis of missense variants in human membrane proteins, the authors find decisively that pathogenic mutations are heavily enriched within the transmembrane region of membrane proteins. In addition, they report that they can sometimes use their calculated properties to classify residues based on their potential roles in stability or function, and that stability appears to be a major determinant of conservation and likely pathogenicity for GPCRs.

    The authors thus make meaningful strides towards explaining the clinical impact of variants within membrane proteins, a currently under-characterized yet important category of proteins. The analyses have been conducted in a rigorous way, and the data and protocols are openly available. This work will be of interest to researchers working on membrane proteins as well as those applying computational methods to biophysical systems.

    On the other hand, the choices made by the authors in terms of presentation make the identification of the main conclusions of the paper challenging. In part, this is likely due to fundamental technical challenges associated with calculating biophysical properties for membrane proteins. In addition, although the analysis was performed at the scale of the proteome, due to the decision to only consider X-ray crystallography structures, the number of proteins analyzed is rather small (15). It thus remains unclear how the findings are transferable to other membrane proteins and how robust the comparison between the different functional classes is.


    Revisions essential for endorsement:

    1. The authors are careful with what they claim, to the point where it becomes difficult to interpret the major messages. It appears there are many contributing factors to noise within these assays, resulting in complex figures that make it hard to interpret the data. The goal of presenting the data without overinterpreting it is noble, and the difficulty of digesting and presenting the comparisons in this work should be emphasized, but the complexity of the results made it difficult for reviewers to interpret without more robust processing. Further, we were not always certain how each result fits into the overall argument, which from our reading is whether the performance of predictors for classifying pathogenic mutations based on conservation and stability calculations provides insight into the mechanisms underlying membrane protein disease. Overall, we feel that clarifying the unifying argument of the manuscript and simplifying the figures would greatly improve the comprehensibility of this work. This could be achieved with one of the following approaches, although we leave the final choice to the authors: 
    • The manuscript could attempt to answer the following question: “Can existing methods be used to computationally determine whether pathogenic mutations are due to stability?” It would then explore why this question can or cannot be answered with the current analysis pipeline and existing tools. The answer is likely that the current tools are insufficient and the manuscript would thus point towards a future area of growth to be able to address the question.
    • The manuscript could focus on presenting the dataset. The results would be presented as preliminary examples of the kind of information that can be extracted and the type of analysis that may be done. In this case, claims such as “stability causes x% of pathogenic mutations” should be avoided, and the most important aspect of the manuscript would be that it accompanies a well-curated and openly available dataset, and provides links to it. In that context, the authors should mention whether there are existing curated and/or established databases of (human) membrane proteins, and how the dataset of putative membrane proteins compares with these resources.
    • The manuscript could focus on presenting the “computational approach”, which consists of mapping ddG-ddE, combined with an analysis of the localization of pathogenic (and non-pathogenic) mutations and the types of mutation (conservative, non-conservative etc.). Revisions would be needed to present results as examples of the kind of information this approach may provide.
    • The manuscript could possibly make a clear and compelling case for the idea that mutations of membrane proteins cause disease either because they destabilize the protein or because they occur at sites that are directly involved in function. This would require major revisions of the results and a systematic, clear and robust combined analysis of quadrant-location, protein-region-location, and amino-acid-type substitution.

    Related to the above, it would be useful to clarify in the introduction what is expected from the study upfront: did the authors expect that the picture that would emerge would indeed be the same for membrane proteins as for soluble proteins? Are there different degradation pathways for these two classes of proteins and is a loss of stability expected to have different consequences or not? In the end, the role of destabilization is rationalized in terms of buriedness and amount of physico-chemical change upon mutations. Hence, are the results of the study saying something about the mechanisms of disease variants or simply about the physico-chemical composition and topology of membrane proteins? To answer this point, we suggest contextualizing the study more by expanding on the published literature. This would also clarify that the membrane protein folding field is very far behind the soluble protein folding field, and, as a result, that we cannot expect the methods that work for soluble proteins to work for membrane proteins, or even if methods will mature to the point that they do yield predictive results for membrane proteins.

    1. In general, uncertainties need to be better quantified and discussed and statistical tests included. For example:
    • The low correlation of Rosetta estimates of ΔΔG and experimental ΔΔG is 0.47, which means less than 25% of ΔΔG is accounted for by Rosetta. This uncertainty needs to be considered more carefully: it will likely affect the AUC (i.e. is AUC(ΔΔG) < AUC(ΔΔE) because not all mutations are pathogenic due to stability, or is this a mere consequence of the uncertainty of ΔΔG estimates?) and the number of points in the different quadrants (how many of the points in a quadrant are false-positives or false-negatives, etc., and can we guess which they are by using other information such as the protein region, aa-type change, ΔΔE value, etc?).
    • A variant may fall in the “wrong” ΔΔG-ΔΔE quadrant because of the mentioned (large) ΔΔG error, but also because of ΔΔE errors. This needs to be considered. Some estimate of the ΔΔE error needs to be made (e.g. by bootstrapping the alignment). Even in an ideal case in which ΔΔE is dependent only on ΔΔG, i.e. that both ΔΔG_Rosetta and ΔΔE are estimates of a “true” ΔΔG, not all points would fall in a y = x line in the ddE-ddG plane. How many points would there be in each of the quadrants because of mere estimation errors?
    • As the authors state, quadrant IV has few points. But it also seems that there are more blue points than red points in regions further away from the axes. Could the author comment on this observation? Is there a tendency for the ΔΔG measure to “over predict” pathogenicity ?
    • Within the manuscript the authors widely compare different groupings to drive their narrative. For example, on line 115 the authors discuss the enrichment of pathogenic mutations within the transmembrane domains, which then leads to many subsequent explorations of why TMs may be involved in disease. For this comparison, there is a large and visible significant difference, thus there may not be a need for a statistical test for significance. However, there are many other comparisons that are harder to interpret due to multiple different groupings, complex data representation, and at its core a fundamentally complex study. In these cases, we would like to see more robust statistical tests. For example, on line 184, after breaking up data in 2B based on ΔΔG and ΔΔE cutoffs, the authors write “...only a few variants (14.2%) falling in the quadrant of low ΔΔE and ΔΔG…” – it is unclear what a few means or if this is a significant reduction in variants compared to other quadrants.
    1. Regarding the performance of Rosetta to measure ΔΔGs:
    • The authors state that pathogenic mutations causing loss of stability are more often located in the interior of the protein (buried), implying bigger physico-chemical property changes. Isn’t that expected from Rosetta design? Indeed, while the analysis of the distribution of variants among protein regions (buried, etc.) and mutation-type (hydrophobic-to-hydrophobic, etc.) does add additional information to support the hypothesis that in some cases stability loss causes disease, it is important to recognize that this is not completely independent evidence because any ΔΔG predictor should somehow capture the observed patterns.
    • ROC curves are used to determine how well ΔΔG guides pathogenicity, as a follow up to the observations that pathogenic mutations are enriched in TM regions of membrane proteins. The intuition here is that deleterious mutations within TMs are likely disrupting folding and therefore a ΔΔG-based predictor should do relatively well. However, the authors find that Rosetta-based ΔΔG calculations do not do well in all membrane proteins with benign-like and pathogenic mutations (Figure 2A) and solved crystal structures. In contrast, ΔΔG works quite well when trained solely on GPCRs (Figure 3A). The interpretation of this could be that stability is not a major driver of membrane protein disease – however, in many cases it is, such as Rhodopsin and CFTR. In contrast, another explanation is that Rosetta doesn’t predict stability well for mammalian membrane proteins, and in fact the authors discuss this at length in the limitations of the study section, explaining this is because Rosetta is trained on many bacterial beta barrel membrane proteins. We appreciated this section but would have preferred more of this discussion earlier on as it could aid in understanding why the ΔΔG predictors don’t perform accurately, as presented in Figure 2A.
    • Could the authors clarify what they mean by “where the Rosetta energy function suggested a potential incompatibility between the experimental structure and the Rosetta energy function”?
    1. Regarding ΔΔE, in the present work, there is an implicit assumption that the constraints that operate during evolution of the aligned sequences, across species, as captured by GEMME, are the same constraints that affect the variants within a population, and therefore determine whether a variant will be pathological/non-pathological. This is a major assumption that needs to be spelled out and discussed. Mentioning this will help interpret “misplaced” points of the ΔΔE-ΔΔG map.

    Additional suggestions for the authors to consider:

    1. The comparison of pathogenic/non-pathogenic mutations should consistently be made across the various sections of the paper. In too many cases in the present version of the paper this comparison is not emphasized. In some cases, the distribution of variants is described, without clearly differentiating pathogenic from non-pathogenic. In other cases, only pathogenic variants are considered, without comparing with the non-pathogenic cases.
    2. Moving the section on the two specific proteins to the end of results would likely improve the flow of the paper. The A/B x ΔΔE-ΔΔG plane analysis would be presented first, then the A/B x ΔΔE-ΔΔG x “protein regions” analysis, and finally the A/B x ΔΔE-ΔΔG x regions x “aa-type” analysis before ending with examples.
    3. The choice to restrict the analysis to X-ray crystallography structures from the PDB is not obviously well suited. Indeed, the coverage of membrane proteins by the PDB is rather low, and the authors found that less than 30% of all annotated human membrane proteins have at least some part resolved. One of the potential advantages of the AlphaFold database is to improve this coverage, and the analyses presented by the authors would thus benefit from considering predicted models displaying high confidence values.
    4. In Figure 2, the authors define two classes of variants in their dataset, group A (pathogenic variants) and group B (benign or non-pathogenic with an allele frequency _\>_ 9.9 · 10^-5). Then they tested their models’ ability to distinguish between groups A and B by constructing ROC curves for Rosetta ΔΔG and GEMME ΔΔE. To visualize variant effects and further classify variants, they plotted individual variants along a ΔΔG vs. ΔΔE plot. They then use this plot to further classify variants based on their combined ΔΔG and ΔΔE values. The allele frequency cutoff is so important for generating group B that all downstream analysis is dependent on this. But because these residues are coming from a much more limited set of proteins, we think it would be useful to include a comparison showing that the gnomad allele frequency _\>_ 9.9 x 10^-5 cutoff remains informative for differentiating between benign and pathogenic residues.
    5. In Figure 3, the authors apply their analysis to variants across all GPCRs, as well as just GPCR transmembrane regions. The AUC curves in panel A are much more accurate when applied to just this protein family, as also seen in panel B where variants fall into very clear subpopulations within each quadrant. The illustration and category definitions on the left of panel C are a helpful guide for the discussion of different variant types and their relevance to stability of the protein versus function in a unique way, however the plot on the right of panels C and D is confusing and not immediately intuitive making it difficult to consider comparisons that are discussed within the text. Indeed, the authors state that “Pathogenic variants in GPCRs, especially in the transmembrane region, lose function mostly by loss of stability”. Comparing these two panels, it is concluded that the pathogenic variants that do not lose stability are more often found in the TM regions of GPCRs compared to all datasets. This is somewhat confusing and the numbers supporting this affirmation in Fig 3C seem quite low.
    6. The authors do not extensively discuss their results in the context of the membrane protein field nor the specific membrane proteins they highlight such as Rhodopsin and GTR1 (Figure 4). For Rhodopsin, at least, there has been extensive work done on its folding by Johnathan Schlebach’s lab and others, including a mutational scan. It could be useful to at least contextualize and contrast results here with previously published work. 
    7. In Figure 5, the authors consider whether the identities of the starting and mutant residues correlate with their overall quadrants. Panel A is extremely difficult to interpret. We are  also unsure how robust any differences are likely to be, given the uneven sampling and the small number of samples in some of the boxes. Narrowing the comparisons (changed vs. unchanged property, A vs B) would likely improve comprehension and may be more meaningful. Panel B is, on the other hand, a wonderful example of how to clearly display complex, multidimensional data in a comprehensible way. The well-demonstrated association of hydrophobicity and transmembrane stability is beautifully demonstrated directly from the data, and the potential discordance with evolutionary conservation as well. We find this correlation even more striking given that the hydrophobicity scale used here was explicitly determined in the context of transmembrane regions, but the variants are drawn from all regions of the targets. We were curious to know what percentage of these are drawn from the transmembrane vs. soluble regions of the targets.


    Reviewed by:

    Willow Coyote-Maestas Paper Discussion Group, UCSF, USA: membrane proteins; high throughput experimental variant screening; developing assays for measuring how mutations break membrane proteins in order to explore how mutations alter folding, trafficking, and function of membrane proteins (see Appendix for group members).

    Julian Echave, Professor, Universidad Nacional de San Martín, Argentina: theoretical and computational study of biophysical aspects of protein evolution.

    Elodie Laine, Associate Professor, Sorbonne Université, France: development of methods for predicting the effects of missense mutations using evolutionary information extracted from protein sequences and/or structural information coming from molecular dynamics simulations.

    Curated by:

    Lucie Delemotte, KTH Royal Institute of Technology, Sweden


    Willow Coyote-Maestas Paper Discussion Group:

    Feedback was generated in a meeting of the journal club involving:

    Willow Coyote-Maestas

    Christian Macdonald

    Donovan Trinidad

    Patrick Rockefeller Grimes

    Matthew Howard

    Arthur Melo

    (This consolidated report is a result of peer review conducted by Biophysics Colab on version 1 of this preprint. Minor corrections and presentational issues have been omitted for brevity.)