Interpreting the Effects of DNA Polymerase Variants at the Structural Level Using MAVISp and Molecular Dynamics Simulations
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Review Commons)
Abstract
Genetic variants in the DNA polymerase enzymes POLE and POLD1 can affect protein function by altering stability, catalysis, DNA binding, and interactions with other biomolecules. Understanding the structural basis of these variants is important for a comprehensive interpretation of variant impact. In this study, we used MAVISp, a modular structure-based framework, and molecular dynamics simulations to analyze over 60,000 missense variants of POLE and POLD1. By integrating results from changes in folding and binding free energies, local alterations in the proximity of the active or phosphorylation sites, we provided a detailed structural interpretation of variants reported across various databases, including ClinVar, COSMIC, and cBioPortal. Moreover, we predicted the functional consequences of variants not found yet in disease-related databases, thereby creating a comprehensive catalogue for future studies. Of note, our approach enabled us to classify 364 Variants of Uncertain Significance (VUS) as PP3 evidence and 323 as BP4 evidence, in accordance with the American College of Medical Genetics and Genomics (ACMG) guidelines. Additionally, we identified a group of variants that could alter the native orientation of the residues within the catalytic site of the exonuclease domain, such as POLE variants P297S and P436R. Finally, we identified a group of variants predicted to affect DNA-binding affinity and rationalized their effects in terms of different energetic contributions and structural features. Collectively, our results not only advance our understanding of protein variant effects in POLE and POLD1 at the structural level but also support future studies aimed at variant classification, variant prioritization for experimental studies, and functional interpretation across diverse biological contexts.
Article activity feed
-
Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
Learn more at Review Commons
Reply to the reviewers
__Reviewer #1 __
*This study "Interpreting the Effects of DNA Polymerase Variants at the Structural Level" comprises an in-depth analysis of protein sequence variants in two DNA polymerase enzymes with particular emphasis on deducing the mechanistic impact in the context of cancer. The authors identify numerous variants for prioritisation in further studies, and showcase the effectiveness of integrating various data sources for inferring the mechanistic impact of variants. *
*All the comments below are minor, I think the manuscript is exceptionally well written. *
*> The main body of the manuscript has almost as much emphasis on usage of the MAVISp tool …
Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
Learn more at Review Commons
Reply to the reviewers
__Reviewer #1 __
*This study "Interpreting the Effects of DNA Polymerase Variants at the Structural Level" comprises an in-depth analysis of protein sequence variants in two DNA polymerase enzymes with particular emphasis on deducing the mechanistic impact in the context of cancer. The authors identify numerous variants for prioritisation in further studies, and showcase the effectiveness of integrating various data sources for inferring the mechanistic impact of variants. *
*All the comments below are minor, I think the manuscript is exceptionally well written. *
*> The main body of the manuscript has almost as much emphasis on usage of the MAVISp tool as analysis of the polymerase variants. I don't think this is an issue, as an illustrated example of proper usage is very handy. I do, however, think that the title and abstract should better reflect this emphasis. E.g. "Interpreting the Effects of DNA Polymerase Variants at the Structural Level with MAVISp". This would make the paper more discoverable to people interested in learning about the tool. *
We have changed the manuscript title according to the reviewer’s suggestions, and the current title is “Interpreting the Effects of DNA Polymerase Variants at the Structural Level using MAVISp and molecular dynamics simulations.”
*> Figure 1. I don't believe there is much value in showing the intersection between the datasets (especially since the in-silico saturation dataset intersects perfectly with all the others). As an alternative, I suggest a flow-chart or similar visual overview of the analysis pipeline. *
We moved the former Figure 1 to SI. We decided to keep it at least in SI because it provides guidance on the number of variants relative to the total reported across the different disease-related datasets annotated with the MAVISp toolkit. On the other hand, the suggestion of a visual scheme for the pipeline followed in the analyses is a great idea. We have thus added Figure 1, which illustrates the pipeline workflows for analysis of known pathogenic variants and for discovery of VUS and other unknown variants, as suggested by the reviewer.
*> Please note in the MAVISp dot-plot figure legends that the second key refers to the colour of the X-axis labels rather than the dots *
We have revised the code that produces the dotplot so the second key is placed closer to the x-axis and clearer to read.
Missing figure reference (Figure XXX) at the bottom of page 16
We apologize for this mistake. Figures, contents, and the order have changed significantly to address all reviewers’ comments; this statement is no longer included. Also, we have carefully proofread the final version of the manuscript before resubmitting it.
__Reviewer #2 __
This manuscript reports a comprehensive study of POLE and POLD1 annotated clinical variants using a recently developed framework, MAVISp, that leverages scores and classifications from evolutionary-based variant effect predictors. The resource can be useful for the community. However, I have a number of major concerns regarding the methodology, the presentation of the results.
*** On the choice of tools in MAVISp and interpretation of their outputs *
*- Based on the ProteinGym benchmark: https://proteingym.org/benchmarks, GEMME outperforms EVE for predicting the pathogenicity of ClinVar mutations, with an AUC of 0.919 for GEMME compared to 0.914 for EVE. Thus, it is not clear for me why the authors chose to put more emphasis on EVE for predicting mutation pathogenicity. It seems that GEMME can better predict this property, without any adaptation or training on clinical labels. *
We appreciate this comment, but we should not exclude EVE entirely from our data collection or from VEP coverage under MAVISp, based on a difference in AUC of 0.005. It was not our intention to place more emphasis on EVE predictions, and we have revised it accordingly. We would like to clarify the workflow we use for applications of the MAVISp framework in “discovery mode,” i.e., for variants not reported as pathogenic in ClinVar. This relies on AlphaMissense to prioritize the pathogenic variants and then retain further only the ones that also have an impact according to DeMaSk, which provides further indication for loss/gain-of-fitness. DeMaSk nicely fits the MAVISp framework, as it was trained on data from experimental deep mutational scans, which we generally import in the EXPERIMENTAL_DATA module. We have revised the text to make this clearer. GEMME and EVE (or REVEL) can be used for complementary analysis in the discovery workflow. Other users of MAVISp data might want to combine them with a different design, and they have access to all the original scores in the MAVISp database CSV file and the code for downstream analysis to do so. The choice for our MAVISp discovery workflow is mainly dictated by the fact that we have noticed we do not always have full coverage of all variants in many protein instances for EVE, GEMME, and REVEL. In particular, since the reviewer highlights GEMME over EVE, GEMME is currently unavailable for a few cases in the MAVISp database. This is because we need to rely on an external web server to collect the data, which slows down data collection on our end.
Additionally, we have encountered instances where GEMME was unable to provide an output for inclusion in the MAVISp entries. When we designed the workflow for variant characterization in focused studies, we also made practical considerations. We are also exploring the possibility of using pre-calculated GEMME scores from
https://datadryad.org/dataset/doi:10.5061/dryad.vdncjsz1s, but we encountered some challenges at the moment that deserve further investigations and considerations. For example, MAVISp annotations rely on the canonical isoform as reported in Uniprot, which can lead to mismatches with the GeMME pre-computed scores. So far, we have identified a couple of entries whose canonical isoforms no longer match the one in the pre-computed GEMME score dataset. Another limitation is the absence of the original MSA files in the dataset, which we would need for a more in-depth comparison with the ones we used for our calculations. We are facing some challenges in reproducing the MSA output from MMseq2-based ColabFold protocol in this context that need to be solved first. Overall, the dataset shows potential for integration into MAVISp, but we need to define the inclusion criteria and compare it with the existing results in more detail.
Additionally, since the principle behind MAVISp is to provide a framework rooted in protein structure, AlphaMissense was the most reasonable choice for us as the primary indicator among the VEPs for our discovery workflow, and it has performed reasonably well in this case study and others.
Of course, our discovery design is one of the many applications and designs that could be envisioned using the data provided and collected by MAVISp. We also include all raw scores in the database's final CSV files, allowing other end users to decide how to use them in their own computational design. The design choice we made for the discovery phase of focused studies, using MAVISp to identify variants of interest for further studies, has been applied in other publications (see https://elelab.gitbook.io/mavisp/overview/publications-that-used-mavisp-data) in some cases together with experiments. It is also a fair choice for the application, as the ultimate goal is to provide a catalog of variants for further studies that may have a potentially damaging impact, along with a corresponding structural mechanism.
We have now revised the results section text where Table 1 is cited to clarify this. We also revised the terminology because we are using the VEPs' capability to predict damaging variants, rather than the pathogenic variants themselves. Experiments on disease models should validate our predictions before concluding whether a variant is pathogenic in a disease context, and we want to avoid misunderstandings among readers regarding our stance on this matter.
- Which of the predictors, among AM, EVE, GEMME, and DeMaSK, provide a classification of variants and which ones provide continuous scores? This should be clarified in the text. If some predictors do not output a classification, then evaluating their performance on a classification task is unfair. The MAVISp framework sets thresholds on the predicted scores to perform the classification and it is unclear from reading the manuscript whether these thresholds are optimal nor whether using universal cutoff values is pertinent. For instance, for GEMME, a recent study shows that fitting a Gaussian mixture to the predicted score distribution yields higher accuracy than setting a universal threshold (https://doi.org/10.1101/2025.02.09.637326*). Along this line, for predictors that do not provide a classification, I am not convinced of the benefit for the users of having access to only binary labels, instead of the continuous scores. The users currently do not have any idea of whether each variant is borderline (close to theshold) or confident (far from threshold). *
We agree with the reviewer, and this is due to us not being sufficiently clear in the manuscript. We have now revised the first part of the results to clarify this and to explain how we use the MAVISp data for application to focused studies, where the goal is to identify the most interesting variants that are potentially damaging and have a linked structural mechanism. Of course, there are other applications for leveraging the data in the database. We do offer scores to variants instead of just classification labels in the MAVISp csv file. They can be accessed, together with the full dataset, through the MAVISp website and reused for any applications.
Additionally, we used the scores in the revised manuscript for the VUS variant ranking (Figure 5), applying a strategy recently designed as an addition to the downstream analysis tool kit of MAVISp (https://github.com/ELELAB/MAVISp_downstream_analysis), thereby allowing the scores themselves to be taken into account. Also, in the final part of the manuscript, the VEP scores have been used to introduce the ACMG-like classification of the variants in response to reviewer 3 (Figure 9 and Tables S3-S4). We absolutely agree that it is informative to keep the continuous scores, and we have never overlooked this aspect. However, we also need a strategy with a simpler classification to highlight the most interesting variants among thousands or more to start an exploration. This is why we included the support with dotplots and lolliplots, for example. Our purpose here is to identify, among many cases, those with a potentially damaging signature (and thus we need a binary classification for simplicity). Next, we evaluate whether this signature entails a fitness effect (with DeMaSk), and finally, retain only the cases we can identify with a structural mechanism to study further.
The thresholds we set as the default for data analysis of dotplots in GEMME and DeMaSk are discussed in __Supplementary Text S3 __of the original MAVISp article. In brief, we carried out an ROC analysis against the scores for known pathogenic and benign variants in ClinVar with review status higher than 2. For applicative purposes, one could design other strategies to analyze the MAVISp data too; it is not limited to the workflow we decided to set as the primary one for our focused studies, as already mentioned above.
We have now also included classification based on the GMM model applied to GEMME scores for POLE and POLD1, so it can be evaluated against other designs for our protein of interest (see Table 1 in the revised version). The method section has been revised to include this part, and the ProteoCast pre-print is cited as a reference. We have not yet officially included this classification in the MAVISp database because we must first follow internal protocols to meet the inclusion criteria for new methods or analyses. We will do so by performing a similar comparison on the entire MAVISp dataset and focusing on high-quality variants, as ClinVar annotations, as we did to set the current thresholds for GEMME in Supplementary Table S3 of the original MAVISp article. We need to allocate time and resources to this pilot, which is scheduled for Q1 2026.
** On the presentation and impact of the results
- While reading the manuscript, it is difficult to grasp the main messages. The text contains abundant discussion about the potential caveats of the framework, the care that should be taken in interpreting the results, and the dependency on the clinical context. Although these aspects are certainly important, this extensive discussion (spread throughout the manuscript) obscures the results. Moreover, the way variants are catalogued throughout the text makes it difficult to grasp key highlights. The reader is left unsure about whether the framework can actually help the clinical practitioners.
We have revised the text to make it easier to read, including additional MD simulations of three variants of interest and more downstream analyses to clarify the mechanisms of action. We also added a recap of the most interesting variants and their associated mechanisms, along with the ranking of the variants using the different features available in the MAVISp csv file for the VUS. We hope that this makes it more accessible and valuable. In the original publication, Table 2 aimed to provide a summary of the interesting variants, and we have revised it now in light of the ranking results and the additional analyses that allow us to clarify the mechanisms of action further. We have also introduced__ Figure 9 and Tables S3 and S4__, which present data on ACMG-like classification for VUS that can fall into the likely pathogenic or benign categories.
- In many cases, the authors state that experimental validation is required to validate the results. Could they be more explicit on the experimental design and the expected outcome?
We have added a section on the point above at pages 21 and 30, where, alongside the summary of mechanisms per variant, we propose the experimental readouts to use based on known MAVE assays or assays that could be designed.
- AlphaMissense seems to tend to over-predict pathogenicity. Could the authors comment on that?
We are unsure whether this comment relates to our specific case or to a general feature of AlphaMissense.
In the latest iteration of our small benchmarking dataset for POLE and POLD1 (as shown in the paper), we achieve a sensitivity of 1 and a balanced specificity of 0.96 for AlphaMissense, which suggests that AlphaMissense does not over-predict pathogenicity very significantly in these proteins, predicting true negatives (i.e., non-pathogenic) mutations quite accurately. As performance was sufficient in our case, we deemed recalibrating the classification threshold for AlphaMissense unnecessary.
We are aware that this is not necessarily the case for every gene, e.g., it has been shown that AlphaMissense shows lower specificity in some cases (see e.g. 10.3389/fgene.2024.1487608, 10.1038/s41375-023-02116-3). This is also why we found it essential to evaluate its performance with its recommended classification on a gene-specific basis, as done here. In the future, we will keep a critical eye on our predictors to understand whether they are suitable for the specific case of study, or whether they require threshold recalibration or the use of a different predictor.
** On specific variants
- The mention of H1066R, H1068, and D1068Y is very confusing. There seems to be a confusion between residue numbers and amino acid types.
We have revised the text for typos and errors. This part of the text changed, so these specific variants are no longer mentioned.
- A major limitation of the 3D modeling is this impossibility to include Zn2+ coordination by cysteine residues. This limitation holds for both POLE and POLD1. Could the authors comment on the implication of this limitation for interpreting the mechanistic impact of variants. In particular, there are several variants reported in the study that consist in gain of cysteines. The authors discuss the potential impact of some of these mutations on the structural stability but not that on Zn coordination or the formation of disulphide bridges.
This is a great suggestion. We had, for a long time, a plan in the pipeline to include a module to tackle changes in cysteines. We have now used this occasion to include a new module that allows identifying mutations: 1) that are likely to disrupt native disulphide bridges and annotate them as damaging or 2) potential de novo formation of disulphide bridges upon a mutation of a residue to a cysteine, also annotated as damaging with respect to the original functionality. We also included a step that evaluates if the protein target is eligible for the analysis based on the cellular localization, since in specific compartments the redox condition (such as the nucleus) would not favour disulfide bridges. The module has been added to MAVISp, and we are collecting data with the module for the existing entries in the database to be able to release them at one of the following updates. More details are on the website in the Documentation section (https://services.healthtech.dtu.dk/services/MAVISp-1.0/). We could not apply the module to POLE and POLD1 since they are nuclear proteins, and it would not be meaningful to look into this structural aspect either in connection with loss of native cysteines or de novo disulfide bridge formation upon mutations that change a wild-type residue to a cysteine.
We would like to clarify that the structures we use, as it is a focused study rather than high-throughput data collection for the first inclusion in the MAVISp database, have been modelled with zinc at the correct position. It is just the first layer of high-throughput collection with MAVISp, which uses models without cofactors unless the biocurator attempts to model them or we move to collect further data for research studies (as done here). Prompted by this confusion, we have now added a field to the metadata of a MAVISp entry indicating the cofactor state. Nevertheless, the RaSP stability prediction does not account for the cofactor's presence, even when it is bound in the model. This is discussed in the Method Section. We thus did not further analyze the variants in sites directly coordinating the metal groups due to these limitations.
- MAVISp does not identify any mechanistic effect for a substantial portion of variants labelled as pathogenic. Could the authors comment on this point?
We are not sure how to interpret this question. It can be read two ways. Either the reviewer is asking about the known pathogenic ClinVar variants without mechanistic indicators, or more generally, the ones that we label “pathogenic” in discovery (we actually refer to more usually damaging in the dotplots), and for which we cannot associate a mechanism.
Overall, as a general consideration, it would be challenging to envision a mechanism for each variant predicted to be functionally damaging. For example, in the case of POLE and POLD1, we still lack models of complexes that did not meet the quality-control and inclusion criteria for the binding-free-energy scheme used by the LOCAL INTERACTION module. Also, when it comes to effects on catalysis or to analyzing effects in more detail at the cofactor sites, we could miss effects that would require QM/MM calculations. Other points we have not yet covered include cases related to changes in protein abundance due to degron exposure for degradation, which is one of the mechanistic indicators we are currently developing. Moreover, we used only unbiased molecular simulations of the free protein, and we would need future studies with enhanced sampling approaches and longer timescales to better address conformational changes and changes in the population of different protein conformational states induced by the mutation (including DNA). This can be handled formally by the MAVISp framework using metadynamics approaches, but it would be outside the scope of this work and is a direction for future studies on a subset of variants to investigate in even greater detail.
Furthermore, modifications related to PTM differ from phosphorylations. Anyway, our scope is to use the platform to provide structure-based characterization of either known pathogenic variants or potentially damaging ones predicted by VEPs, and focus on more detailed analyses of those. As we develop MAVISp further and design new modules, we will also be able to tackle other mechanistic aspects. This discussion, however, is more relevant to the MAVISp method paper itself.
Moreover, none of the variants discussed are associated with allosteric effect. Is this expected?
.
In general, allosteric mutations are rare. Nevertheless, in these case studies, the size of the proteins under investigation also poses some challenges for the underlying coarse-grain model used in the simple mode to generate the allosteric signalling map, as we have found it performs best on protein structures below 1000 residues
__Reviewer #3 (Evidence, reproducibility and clarity (Required)): __
The manuscript utilized the MAVISp framework to characterize 64,429 missense variants (43,415 in POLE and 21,014 in POLD1) through computational saturation mutagenesis. The authors integrate protein stability predictions with pathogenicity predictors to provide mechanistic insights into DNA polymerase variants relevant to cancer predisposition and immunotherapy response. There are discussions of known PPAP-associated variants and somatic cancer mutations in the context of known data and some proposed variants of interest (which are not validated).
Major comments:
I was unaware of the MAVISp framework. It concerns me that alebit this paper has a lot of technical details about the framework, its not the paper about the framework. I did look into the paper https://www.biorxiv.org/content/10.1101/2022.10.22.513328v5 which keeps benign updated (version five now) for three years, but I do not see a peer reviewed version. It would be unfair of me to peer review the underlying framework of the work but together with the previous comments, I am a bit concerned.
We have intentionally left the MAVISp resource paper as a living pre-print until we have sufficient data in the database that could be useful to the rest of the community. We have been actively revising the manuscript, thanks to comments from users in previous versions, to ensure it provides a solid resource. We had attempted approximately one and a half years ago a submission to a high-impact journal and even addressed the reviewers’ comments there. Still, we did not receive feedback for a long time, and ultimately, we were not sent to the reviewers again despite more than six months of work on our side. After that, we realized that we would benefit from collecting a larger dataset, and we invested time and effort in that and submitted again for revision, this time through Review Commons in the Summer of 2025. Anyway, the paper has been peer-reviewed by three reviewers through Review Commons. We submitted the revised version and response to reviewers, and it is now under revision with Protein Science. The reviewers’ comments and our responses can be found in the “Latested Referred Preprints” on the Review Commons website with the date of 17th of October 2025.
We would also like to clarify another point on this. In our experience, it is common practice to keep sofware on BioRxiv even for a long and to bring it to a more complete form in parallel with the community already applying it. This allows feedback from peers in a broad manner. We had similar experiences with MoonlightR, where the first publications with applications within the TCGA-PanCancer papers came before the publication of the tool itself, and the same has been for any of our main workflows, such as MutateX or RosettaDDGPrediction, which are widely used by the community. Finally, it can be considered that the MAVISp framework has already been used in different published peer-review studies (since 2023), attesting to its integrity and potential. Here, the reviewer can read more about the studies that used MAVISp data or modules: https://elelab.gitbook.io/mavisp/overview/publications-that-used-mavisp-data
For example, the authors are using AlphaFold models to predict DDG values. Delgado et al. (2025, Bioinformatics) explicitly tested FoldX on such models and concluded that "AlphaFold2 models are not suitable for point mutation ΔΔG estimation" after observing a correlation of 0.06 between experimental and calculated values. AlphaFold's own documentation states it "has not been validated for predicting the effect of mutations". Pak et al. (2023, PLOS ONE) showed correlation between AlphaFold confidence metrics and experimental ΔΔG of -0.17. Needless to say that these concerns seriously undermine the validity of a major part of the study.
We appreciate the reviewer’s comments and would like to clarify a point regarding the MAVISp STABILITY module, which we believe may have been misunderstood. Based on the studies cited by the reviewer, which critique the use of AF-generated mutant structures for assessing stability effects, we understand that this assumption may have led to the concern.
The STABILITY module utilises three in silico tools (FoldX, Rosetta, and RaSP) to assess changes in protein stability resulting from missense mutations. Importantly, the input to these assessments consists of AF models of the WT protein structures, not of AF-generated mutant structures. The mutants are generated using the FoldX and Rosetta protocols, along with estimates of the changes in free energy. For further details and clarification, we kindly refer the reviewer to the MAVISp original publication.
Also, one should consider the goal of our use of free energy calculations: not to identify the exact ΔΔG values, but to correlate with data from in vitro or biophysical experiments, such as those from cellular experiments like MAVE. We, other researchers, have shown that we have a good agreement in the MAVISp paper (case study on PTEN as an example in the original MAVISp publication and https://pmc.ncbi.nlm.nih.gov/articles/PMC5980760/ https://pubmed.ncbi.nlm.nih.gov/28422960/,10.7554/eLife.49138). Also, we had, before even designing the STABILITY module for MAVISp, verified that we can use WT structures from AlphaFold (upon proper trimming and quality control with Prockech) instead of experimental structure without compromising accuracy in the publications of the two main protocols of the STABILITY module (MutateX and RosettaDDGPrediction and a case study on p53, https://doi.org/10.1093/bib/bbac074,https://doi.org/10.1002/pro.4527). In the focused studies, we also carefully consider whether the prediction is at a site with a low pLDDT score or surrounded by other sites with a low pLDDT score before reaching any conclusions. The pLDDT score is reported in the MAVISp csv file exactly to be used for flagging variants or looking closer at them, as we discuss in this study (see, for example, Figure 2). Additionally, it should be noted that we employ a consensus approach across the two classes of methods in MAVISp to account for their limitations arising from their empirical energy function or backbone stiffness. Furthermore, in the focused studies, we also collected molecular dynamics simulations for the ensemble mode and reassessed the stability on different conformations from the trajectory to compensate for the issues with backbone stiffness of FoldX, RaSP, and Rosetta ΔΔG protocols.
I have to add that this is also true for the technical choices: Several integrated predictors (DeMaSk, GEMME) are outperformed by newer methods according to benchmarking studies (https://www.embopress.org/doi/full/10.15252/msb.202211474). AlphaMissense, while state-of-the-art, shows substantial overcalling of pathogenic variants. could ensemble meta-predictors (REVEL, BayesDel) improve accuracy?
The MAVISP framework includes REVEL as one of the VEPs available for data analysis. In this way, we were representing one of the ensemble meta-predictors. This is explained in the MAVISp original paper. We were not aware of BayesDel, which we will consider for one of the next pilot projects to assess new tools for the framework (see more details below on how we generally proceed). Currently, we cannot use REVEL for all variants because we do not necessarily have genomic coordinates for them. We retrieve genomic-level variants corresponding to our protein variants from mutation databases, where available (e.g., ClinVar, COSMIC, or CbioPortal). However, as we strive to cover every possible mutation, several of the variants in MAVISp are not in the database, which means we do not have the corresponding genomic variation for those, limiting our ability to annotate them with VEPs. In the future (see GitHub issue https://github.com/ELELAB/cancermuts/issues/235), we will revise the code to identify the genomic variants that could give rise to each protein mutation of interest, thereby increasing the coverage of VEP annotations.
We can see from the work cited by the reviewer that ESM-1v, EVE, and DeepSequence are among the top performers, whereas reviewer 2 cited another work in which GEMME outperforms EVE. We have been covering all of them, except ESM-1v, in our framework. We are planning to evaluate for inclusion in MAVISP some of the new top-performing predictors, including ESM-1v, in Q2 2026 (according to the protocol described later in this answer), which is why it is not available yet.
In our discovery protocol (i.e., when we work on VUS or variants not classified in ClinVar), we generally use AlphaMissense as the first indicator of potentially damaging variants. EVE, REVEL, or GEMME could be used in the case that AlphaMissense data are missing or as a second layer of evidence in the case we want, for example, to select a smaller pool of variants for experimental validation in a protein target with too many uncharacterized variants and too many that pass the evaluation with our discovery workflow. Finally, we rely on DeMaSk, as it also provides information on possible loss- or gain-of-fitness signatures to further filter the variant of interest for the search of mechanistic indicators. Since the MAVISp framework is modular, other users may want to use the data differently and design a different workflow. They have access to them (scores and classifications) through the web portal. The fact that we combine AlphaMissense with DeMaSk could yield final results after further variant filtering and mitigate the issue that AlphaMissense risks over-predicting pathogenicity.
In general, we work to keep MAVISp up-to-date, and we have developed a protocol for the inclusion of new methodologies in the available module before generating and releasing data with new tools in the database. In particular, we perform comparative studies using data already available in the database to evaluate the performance of new approaches against that of the tools already included. Depending on the module, we use different golden standards that we are also curating in parallel, and it would make sense to apply for that specific module. For example, if the question is to evaluate VEP, we would compare it against ClinVar known variants with good review status. If the VEP performs better than the currently included ones, we can include it as an additional source of annotations and evaluate whether we could change the protocol for the discovery/characterization of variants. We operate similarly for the structural modules. For example, for stability, we are importing experimental data from MAVE assays on protein abundance and use them as a golden standard where we evaluate new approaches against the current FoldX and Rosetta-based consensus for changes in folding free energies. Instead, If we find evidence that suggests switching to a new method or integrating it would be beneficial, we will do so as a result of these investigations. An example of our working mode for evaluating tools for inclusion in the framework is illustrated by how we handled the comparison between RaSP and Rosetta in the MAVISp original article (Supplementary file S2) before officially switching to RaSP for high-throughput data collection. We still maintain Rosetta, especially in focused studies, to validate further variants classified as uncertain.
*Further, I found the web site of the framework, where I looked for the data on these models, rather user unfriendly. Selecting POLD1, POLD2, or POLE tells me I am viewing entries A2ML1, ABCB11, ABCB6 respectively, when I search for POL and then click: these are the first three entries of the table, bot the what I click on. displaying the whole table and clicking on POLD1, gets me to POLD1. However, when I selected "Damaging mutations on structure" I get "Could not fetch protein structure model from the AlphaFold Protein Structure Database". Many other features are not working (Safari or Chrome, in a Mac). That is a concern for the usability of the dataset. *
We have been able to reproduce the bugs identified by the reviewer and have fixed them. The second was connected to recent updates on the AlphaFold Protein Structure Database. We are not really sure how to work and act on the “other features that are not working” due to lack of specificity in this comment. Still, we have worked to make the website more robust: the coauthors of this work and other colleagues in the MAVISp team have extensively tested it across different proteins and with various browsers and operating systems, and we have fixed all identified issues. We also have a GitHub repository where users can open issues to share problems they have been experiencing with the website, which we will fix as promptly as we can (https://www.github.com/ELELAB/MAVISp), as we do for any of the tools we develop and maintain. If the reviewer were to come across other specific problems with the website, we recommend to (anonymously) open issues on the MAVISp repository so that they can be described more in detail and dealt with appropriately.
This comment seems more related to the MAVISP paper itself than to the POLE and POLD1 entries. We have been doing several revisions to the web app to improve it over time. We are also afraid that the reviewer consulted it during one of these changes, and we hope it will be better now. For POLE and POLD1, the CSV files were, in any case, also available through the MAVISp website itself (https://services.healthtech.dtu.dk/services/MAVISp-1.0/), as well as in the OSF repository connected to this paper (https://osf.io/z8x4j/overview), in case the reader needed to consult them or as a reference for the analyses reported in this paper.
Albeit this is a thorough analysis with the existing tools, and the authors make some sparse attempts to put the mutants classification in context with examples, the work stays descriptive for know effects in literature, or point out that e.g. "further functional and in vitro assays are required". The examples are not presented in a systematic way, or in an appealing manner. Thus, what this manuscript adds to the web site is unclear. It is a description of content, which could be at least more appealing if examples woudl be more clearly outlined in a conceptual framework, and illustrated more consistently. For exmaple I read in the middle of mage 16 "One such example is the F931S (p.Phe931Ser) variant (Figure 5A)" and then I see "F931 forms contacts with D626, a critical residue for the coordination of Mg2+ which is essential for the correct orientation of the incoming nucleotide (Figure XXX)". Figure 5B is not XXX as this has just many mutations labeled. These issues are very discouraging. I woudl recommend to put much more effort in examples, put them in clearer paragraphs, and decribe results rather than the methodology. Doing both in an intemigled way, clearly does not work for me.
We have revised the storyline to make it more straightforward for the reader, focusing on the essential messages and avoiding excessive description in the results section, instead conveying the key points directly. We also included new simulation data on three variants and downstream analyses of other variants. We revised the section to focus less on methodologies and more on the actual biological results. We have also added a ranking approach for the VUS and an ACMG-like classification to facilitate the identification of the most important results.
Additionally, we included a summary Table (Table 2) and Figure 9 that present the main findings on the VUS, and we discussed in the text the possible associated experimental validation.
We also do not fully understand the reviewer’s comment “the work stays descriptive for know effects in literature”. We agree that we should make a better effort to write the results in a logical and easy-to-follow manner, without risking the reader getting lost in too many details, and with more dedicated subsections. However, the paper does not describe just known effects in the literature. We had, in the previous version, a section aimed at identifying mechanistic indicators for ClinVar-reported variants that are also (in some cases) functionally characterized. This is true, but it is the very first part of the results, and it is still adding structure-based knowledge to these variants. After this, we also reported predicted results with mechanisms for VUS and variants in other databases. We took the opportunity in this revised version to elaborate more on the results of the variants reported in COSMIC and cBioPortal.
We are afraid that we also do not fully understand the reviewer's comment on the fact that “Thus, what this manuscript adds to the website is unclear.” We have generated POLE and POLD1 data with the MAVISp toolkit in both ensemble and simple mode, and the whole pool of local interactions with other proteins and DNA, specifically for this publication. It should be acknowledged that we have generated new data in ensemble mode, which relies on all-atom microsecond molecular dynamics simulations, and additional modules for the* simple mode, *including calculations with the flexddg protocol of Rosetta, which is also computationally demanding, to provide a comprehensive overview of the effects of variants in POLE and POLD1. The two proteins were available in the database only in simple mode with the basic default modules, and the remaining data were collected during this research article. This can also be inferred by the references in the csv file of the ensemble mode, which refer only to the DOI of the pre-print of this article. This entails a substantial effort in computing and analysis. The website is the repository for data that researchers collect using the MAVISp protocols or modules; in our opinion, it cannot replace a research project. We designed the database to store the data generated by the framework for others to consult and use for various purposes (e.g., biological studies, preparing datasets for benchmarking approaches against existing ones, or using features for machine learning applications). The entry point in the database is the simple mode, along with some compulsory modules (VEPs, STABILITY, PTM, EFOLDMINE, SASA). After this initial entry point, a biocurator or a team of researchers can decide to expand data coverage by moving into the other modules. Still, at some point, one would need to design focused studies to have a comprehensive overview of the effects on specific targets, as we did here, or, for example, in the publication https://doi.org/10.1016/j.bbadis.2024.167260.
Furthermore, there are analyses here, especially in the simulations, that are not directly available from consulting the database; in these cases, one needs to use other resources beyond MAVISp to investigate further the mechanisms underlying the predicted mechanistic indicators. We also included simulations of mutant variants to validate the hypothesis further. And another example is the analysis of the effects on the splicing site that is not covered by a structure-based framework, such as MAVISp, but is still an essential aspect in the analysis of the variants' effects.
Will the community find this analysis useful?
The analysis provided here will be helpful, especially for researchers interested in experimental studies of these enzymes, because they have throughout the study an extensive portfolio of structural data to consult, including a ranked list of variants by class of effect. We originally started designing MAVISp because we realized it was needed by our experimental collaborators, both in cellular biology and in more clinical research, whenever they needed to predict or simulate variants, and we expanded the concept into a robust, versatile framework for broader use. Especially for those genes where extensive MAVE data are not available (as in this case), having a set of variants to test experimentally is crucial support, as it provides the potential mechanism behind the predicted damaging variant.
How many ClinVar VUS could be reclassified using MAVISp data under current ACMG/AMP guidelines?
The ACMG/AMP variant classification guidelines, to the best of our knowledge, include computational evidence (PP3/BP4) and well-established functional studies (PS3/BS3). Because MAVISp provides multi-level mechanistic predictions derived from structural modelling, these data formally fall within the PP3/BP4 computational category. They cannot be used to reclassify ClinVar VUS independently under ACMG/AMP rules. This is not really the goal of our framework, which is to provide a structure-based framework for investigating potentially damaging variants predicted by VEPs. However, the suggestion of the reviewer is something we wanted to explore too in general with MAVISp data, and we failed because of a lack of time. We checked the requirements for PP3, BP4, and PM1 and developed a classifier for VUS reported in ClinVar, using MAVISp features in accordance with the ACMG/AMP guidelines. Using ClinVar pathogenic and benign variants with at least a review status of 1 for calibration, we obtained thresholds for all MAVISp-supported VEPs (REVEL, AlphaMissense, EVE, GEMME, and DeMaSk). These thresholds were then applied to all ClinVar VUS to determine PP3 (pathogenic-supporting) and BP4 (benign-supporting) evidence. In parallel, we constructed a PM1-like mechanistic evidence category that integrates MAVISp structural stability, protein–protein interactions, DNA interactions, long-range allosteric paths, functional sites, and PTM-mediated regulatory effects. Variants classified as damaging in MAVISp according to such criteria were assigned PM1-like support. These evidence tags provide mechanistic insight to support VUS classification for polymerase proofreading genes. The workflow and complete annotated VUS table are now included in the revised manuscript and in the OSF repository. Although these findings cannot formally reclassify variants under ACMG/AMP criteria, they provide prioritization for PS3/BS3 experimental validation and highlight variants that are likely to be reclassified once supporting functional evidence becomes available.
*How do MAVISp predictions meet calibrated thresholds, as in https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-023-01234-y for the exonuclease domain of POLE and POLD1? *
Mur et al. (Genome Medicine 2023) restricted their ACMG/AMP recommendations to the exonuclease domain (ED) because (i) nearly all known pathogenic germline variants in POLE/POLD1 cluster within the ED, (ii) the ED has a well-characterised structure–function architecture, and (iii) sufficient pathogenic and benign variants exist only within the ED to support empirical calibration. To mirror this approach, we performed the calibration workflow exclusively on ED variants (POLE residues 268–471; POLD1 residues 304–533). For these ED-restricted variants, we recalibrated all MAVISp-derived computational predictors (REVEL, AlphaMissense, EVE, GEMME, DeMaSk) using ClinVar P/LP and B/LB variants. We applied the resulting POLE/POLD1-specific thresholds to all ClinVar VUS within the ED. We also applied our PM1-like structural/functional evidence exclusively to ED variants. The results of this ED-specific analysis are now reported in the revised manuscript (Figure 9 Supplementary Tables S3 and S4), as also explained in the response to the previous question. This ensures that MAVISp predictions are applied in a manner that is consistent with the principles of Mur et al. and ACMG/AMP variant interpretation.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #3
Evidence, reproducibility and clarity
The manuscript used the MAVISp framework to characterize 64,429 missense variants (43,415 in POLE, 21,014 in POLD1) through computational saturation mutagenesis. The authors integrate protein stability predictions with pathogenicity predictors to provide mechanistic insights into DNA polymerase variants relevant to cancer predisposition and immunotherapy response. There are discussions of known PPAP-associated variants and somatic cancer mutations in the context of known data and some proposed variants of interest (which are not validated).
Major comments:
I was unaware of the MAVISp framework. It concerns me that alebit this paper …
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #3
Evidence, reproducibility and clarity
The manuscript used the MAVISp framework to characterize 64,429 missense variants (43,415 in POLE, 21,014 in POLD1) through computational saturation mutagenesis. The authors integrate protein stability predictions with pathogenicity predictors to provide mechanistic insights into DNA polymerase variants relevant to cancer predisposition and immunotherapy response. There are discussions of known PPAP-associated variants and somatic cancer mutations in the context of known data and some proposed variants of interest (which are not validated).
Major comments:
I was unaware of the MAVISp framework. It concerns me that alebit this paper has a lot of technical details about the framework, its not the paper about the framework. I did look into the paper https://www.biorxiv.org/content/10.1101/2022.10.22.513328v5 which keeps benign updated (version five now) for three years, but I do not see a peer reviewed version. It would be unfair of me to peer review the underlying framework of the work but together with the previous comments, I am a bit concerned. For example, the authors are using AlphaFold models to predict DDG values. Delgado et al. (2025, Bioinformatics) explicitly tested FoldX on such models and concluded that "AlphaFold2 models are not suitable for point mutation ΔΔG estimation" afte observing a correlation of 0.06 between experimental and calculated values. AlphaFold's own documentation states it "has not been validated for predicting the effect of mutations". Pak et al. (2023, PLOS ONE) showed correlation between AlphaFold confidence metrics and experimental ΔΔG of -0.17. Needless to say that these concerns seriously undermine the validity of a major part of the study. I have to add tha this is also true for toher technical choices: Several integrated predictors (DeMaSk, GEMME) are outperformed by newer methods according to benchmarking studies (https://www.embopress.org/doi/full/10.15252/msb.202211474). AlphaMissense, while state-of-the-art, shows substantial overcalling of pathogenic variants. could ensemble meta-predictors (REVEL, BayesDel) improve accuracy?
Further, I found the web site of the framework, where I looked for the data on these models, rather user unfriendly. Selecting POLD1, POLD2, or POLE tells me I am viewing entries A2ML1, ABCB11, ABCB6 respectively, when I search for POL and then click: these are the first three entries of the table, bot the what I click on. displaying the whole table and clicking on POLD1, gets me to POLD1. However, when I selected "Damaging mutations on structure" I get "Could not fetch protein structure model from the AlphaFold Protein Structure Database". Many other features are not working (Safari or Chrome, in a Mac). That is a concern for the usability of the dataset.
Albeit this is a thorough analysis with the existing tools, and the authors make some sparse attempts to put the mutants classification in context with examples, the work stays descriptive for know effects in literature, or point out that e.g. "further functional and in vitro assays are required". The examples are not presented in a systematic way, or in an appealing manner. Thus, what this manuscript adds to the web site is unclear. It is a description of content, which could be at least more appealing if examples woudl be more clearly outlined in a conceptual framework, and illustrated more consistently. For exmaple I read in the middle of mage 16 "One such example is the F931S (p.Phe931Ser) variant (Figure 5A)" and then I see "F931 forms contacts with D626, a critical residue for the coordination of Mg2+ which is essential for the correct orientation of the incoming nucleotide (Figure XXX)". Figure 5B is not XXX as this has just many mutations labeled. These issues are very discouraging. I woudl recommend to put much more effort in examples, put them in clearer paragraphs, and decribe results rather than the methodology. Doing both in an intemigled way, clearly does not work for me.
Will the community find this analysis useful? How many ClinVar VUS could be reclassified using MAVISp data under current ACMG/AMP guidelines? How do MAVISp predictions meet calibrated thresholds as in https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-023-01234-y for the exonuclease domain of POLE and POLD1? Such questions might undermien teh appear of the work and coudl been looked into.
Referee cross-commenting
I agree with all the comments raised by reviewer 2; she/he elaborates more on some issues I brought up too briefly (e.g. the choice of GEMME) while other issues that I made more comments about are also mentioned. I only want to note that the statement "A major limitation of the 3D modeling is this impossibility to include Zn2+ coordination by cysteine residues" is not accurate, as there are many 3D structure prediction tools and modeling tools that are capable og handling zinc ions coordinated by cysteines.
While I respect that Referee 1 is clearly more positive and less concerned by methodological issues, I note that while I agree that "The authors identify numerous variants for prioritisation in further studies" (albeit in a sparse and not well organised manner in my view), I am not convinced by the present manuscript that "the effectiveness of integrating various data sources for inferring the mechanistic impact of variants" is really shown: there are hypotheses generated, but none are tested, so the effectiveness of the approach remains to be proven in my view.
I still view this as a thorough study and a very brave attempt to be integrative and inclusive, but several methodological limitations and lack of concrete novel insight, seriously dampen my enthusiasm.
Significance
Strengths:
A very comprehensive analysis of POLE and POLD1 missense variants (64,429 total), approximately 600-fold more coverage than the ~100 experimentally characterized variants in the PolED database. The multi-layered MAVISp approach provides mechanistic interpretability beyond simple pathogenic/benign classifications, potentially valuable for understanding variant effects on stability, DNA binding, protein interactions, and allosteric communication. The clinical context is highly relevant given POLE/POLD1 roles in disease.
Limitations:
The methodological concerns were outlined above. No solid new insight examples in a validated manner. Examples of how the datasets can be really used are not well-organised as they appear in the context of the approach in perplexed manner.
Advance:
The advance is primarily technical and database-driven rather than conceptually novel. Scale, Multi-dimensional assessment, Mechanistic insight and consideration of Clinical framework integration is a clear advance.
Audience:
The audience is the POLDPOLE experts; I however doubt if clinical scientists will find the paper useful, especially in the context of the absence of a dedicated resource and the fact that the entried in the MAVISp web-toold are not easily and intuitively accessible and clinical requirements(eg Integration with ACMG/AMP classification frameworks) are not clearly met.
Reviewer expertise: I am a structural biologist with experience in structure analysis of experimental and predicted models, but no specific expertise or interest in polymerases.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #2
Evidence, reproducibility and clarity
This manuscript reports a comprehensive study of POLE and POLD1 annotated clinical variants using a recently developed framework, MAVISp, that leverages scores and classifications from evolutionary-based variant effect predictors. The resource can be useful for the community. However, I have a number of major concerns regarding the methodology, the presentation of the results and the impact of the work.
On the choice of tools in MAVISp and interpretation of their outputs
- Based on the ProteinGym benchmark: https://proteingym.org/benchmarks, GEMME outperforms EVE for predicting the pathogenicity of ClinVar mutations, with an AUC of …
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #2
Evidence, reproducibility and clarity
This manuscript reports a comprehensive study of POLE and POLD1 annotated clinical variants using a recently developed framework, MAVISp, that leverages scores and classifications from evolutionary-based variant effect predictors. The resource can be useful for the community. However, I have a number of major concerns regarding the methodology, the presentation of the results and the impact of the work.
On the choice of tools in MAVISp and interpretation of their outputs
- Based on the ProteinGym benchmark: https://proteingym.org/benchmarks, GEMME outperforms EVE for predicting the pathogenicity of ClinVar mutations, with an AUC of 0.919 for GEMME compared to 0.914 for EVE. Thus, it is not clear for me why the authors chose to put more emphasis on EVE for predicting mutation pathogenicity. It seems that GEMME can better predict this property, without any adaptation or training on clinical labels.
- Which of the predictors, among AM, EVE, GEMME, and DeMaSK, provide a classification of variants and which ones provide continuous scores? This should be clarified in the text. If some predictors do not output a classification, then evaluating their performance on a classification task is unfair. I would guess that the MAVISp framework sets thresholds on the predicted scores to perform the classification and it is unclear from reading the manuscript whether these thresholds are optimal nor whether using universal cutoff values is pertinent. For instance, for GEMME, a recent study shows that fitting a Gaussian mixture to the predicted score distribution yields higher accuracy than setting a universal threshold (https://doi.org/10.1101/2025.02.09.637326). Along this line, for predictors that do not provide a classification, I am not convinced of the benefit for the users of having access to only binary labels, instead of the continuous scores. The users currently do not have any idea of whether each variant is borderline (close to theshold) or confident (far from threshold).
On the presentation and impact of the results
- While reading the manuscript, it is difficult to grasp the main messages. The text contains abundant discussion about the potential caveats of the framework, the care that should be taken in interpreting the results and the dependency on the clinical context. Although these aspects are certainly important, this extensive discussion (spread throughout the manuscript) obscures the results. Moreover, the way variants are catalogued throughout the text makes it difficult to grasp key highlights. The reader is left unsure about whether the framework can actually help the clinical practitionners.
- In many cases, the authors state that experimental validation is required to validate the results. Could they be more explicit on the experimental design and the expected outcome?
- AlphaMissense seems to have a tendency to over-predict pathogenicity. Could the authors comment on that?
On specific variants
- The mention of H1066R, H1068, and D1068Y is very confusing. There seems to be a confusion between residue numbers and amino acid types.
- A major limitation of the 3D modeling is this impossibility to include Zn2+ coordination by cysteine residues. This limitation holds for both POLE and POLD1. Could the authors comment on the implication of this limitation for interpreting the mechanistic impact of variants. In particular, there are several variants reported in the study that consist in gains of cysteines. The authors discuss the potential impact of some of these mutations on the structural stability but not that on Zn coordination or the formation of disulphide bridges.
- MAVISp does not identify any mechanistic effect for a substantial portion of variants labelled as pathogenic. Could the authors comment on this point? Moreover, none of the variant discussed are associated with allosteric effect, is this expected?
Referee cross-commenting
I agree with the comments and overall assessment of Reviewer 3. I would like to take this opportunity to clarify that I did not meant 3D modelling of Zinc ion coordination by Cys is impossible in general. I wanted to emphasise that the exclusion some Zinc-binding sites in the present study is a limitation.
Significance
The work's strength is its comprehensive analysis. The weaknesses are a methodology that does not seem mature and with output that are still difficult to predict. In addition, it seems that a lot of expertise and manual curation based on metadata (phenotype, functional state...) is needed for the users to benefit from the analysis. The manuscript reads a bit like a catalogue from where it is difficult to understand to what extent the results are significant and impactful.
I have expertise in computational modelling, protein sequence-structure-function relationship and prediction of variant effects.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #1
Evidence, reproducibility and clarity
This study "Interpreting the Effects of DNA Polymerase Variants at the Structural Level" comprises an in-depth analysis of protein sequence variants in two DNA polymerase enzymes with particular emphasis on deducing the mechanistic impact in the context of cancer. The authors identify numerous variants for prioritisation in further studies, and showcase the effectiveness of integrating various data sources for inferring the mechanistic impact of variants.
All the comments below are minor, I think the manuscript is exceptionally well written.
- The main body of the manuscript has almost as much emphasis on usage of the MAVISp tool as …
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #1
Evidence, reproducibility and clarity
This study "Interpreting the Effects of DNA Polymerase Variants at the Structural Level" comprises an in-depth analysis of protein sequence variants in two DNA polymerase enzymes with particular emphasis on deducing the mechanistic impact in the context of cancer. The authors identify numerous variants for prioritisation in further studies, and showcase the effectiveness of integrating various data sources for inferring the mechanistic impact of variants.
All the comments below are minor, I think the manuscript is exceptionally well written.
- The main body of the manuscript has almost as much emphasis on usage of the MAVISp tool as analysis of the polymerase variants. I don't think this is an issue, as an illustrated example of proper usage is very handy. I do however, think that the title and abstract should better reflect this emphasis. E.g. "Interpreting the Effects of DNA Polymerase Variants at the Structural Level with MAVISp". This would make the paper more discoverable to people interested in learning about the tool.
- Figure 1. I don't believe there is much value in showing the intersection between the datasets (especially since the in-silico saturation dataset intersects perfectly with all the others). As an alternative, I suggest a flow-chart or similar visual overview of the analysis pipeline.
- Please note in the MAVISp dot-plot figure legends that the second key refers to the colour of the X-axis labels rather than the dots
- Missing figure reference (Figure XXX) at the bottom of page 16
Significance
In addition to identifying a large number of variants in POLE and POLD1 that are good candidates for further investigation, this study acts as a showcase for how evidence from different sources can be combined in a context-dependent manner (in this case, cancer). In terms of limitations, the lack of structures for POLD1 proved a hindrance several times in this study, obscuring some potentially pathogenic variants.
This manuscript bears some similarity to the MAVISp paper (10.1101/2022.10.22.513328v6), which gives a number of brief examples of analyses that can be conducted with the pipeline. This paper differs in that it shows a full start-to-end analysis and goes into considerable detail about possible mechanisms of pathogenesis. The mechanistic detail and potential clinical relevance set this paper apart.
This paper would most likely interest those involved in VUS interpretation, both in a clinical and research capacity. Those specialised in DNA polymerase enzymes would also likely find this paper of interest. This paper may also be useful for future research into the impact of the highlighted variants.
-
