Ensemble Refinement of mismodeled cryo-EM RNA Structures Using All-Atom Simulations
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (PREreview)
Abstract
The advent of single-particle cryogenic electron microscopy (cryo-EM) has enabled near-atomic resolution imaging of large macromolecules, enhancing functional insights. However, current cryo-EM refinement tools condense all single-particle images into a single structure, which can misrepresent highly flexible molecules like RNAs. Here, we combine molecular dynamics simulations with cryo-EM density maps to better account for the structural dynamics of a complex and biologically relevant RNA macromolecule. Namely, using metainference, a Bayesian method, we reconstruct an ensemble of structures of the group II intron ribozyme, which better match experimental data, and we reveal inaccuracies of single-structure approaches in modeling flexible regions. An analysis of all RNA-containing structures deposited in the PDB reveal that this issue affects most cryo-EM structures in the 2.5-4 A range. Thus, RNA structures determined by cryo-EM require careful handling, and our method may be broadly applicable to other RNA systems.
Article activity feed
-
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/14004843.
Summary
Here, the authors sought to apply cryo-EM guided metainterference-based MD simulations to find modeling inaccuracies of flexible helical regions linked to their artificial representation by a single-structure model. They first applied their approach to the group II intron ribozyme from Thermosynechococcus elongatus, one of a few cryo-EM structures available in the PDB that featured mostly RNA and was obtained using a single-structure refinement procedure. They confirmed that remodeling with their approach mostly affected flexible regions at the solvent-exposed stem loops that were not phylogenetically conserved. They found that functional domains that were well-ordered, …
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/14004843.
Summary
Here, the authors sought to apply cryo-EM guided metainterference-based MD simulations to find modeling inaccuracies of flexible helical regions linked to their artificial representation by a single-structure model. They first applied their approach to the group II intron ribozyme from Thermosynechococcus elongatus, one of a few cryo-EM structures available in the PDB that featured mostly RNA and was obtained using a single-structure refinement procedure. They confirmed that remodeling with their approach mostly affected flexible regions at the solvent-exposed stem loops that were not phylogenetically conserved. They found that functional domains that were well-ordered, phylogenetically conserved, and were clearly represented by the cryo-EM density required no remodeling. Extending this analysis through other PDB structures revealed that poor modeling at flexible helical regions was broadly applicable to all RNA-containing cryo-EM-derived structures in the 2.5 - 4 Å resolution range.
Major Points:
It is difficult to keep track of the different parameters applied for each simulation. These are scattered in the text. It would be very helpful for the reader to understand these if the authors could make a table of all the MD simulations with specifications on helices restrained, approach, simulation time, force fields present, equilibration time etc.
Clarification regarding the helical restraints and the simulation time for the initial production run would be helpful—i.e. Why was a simulation time of 2.5 ns chosen, and would this be long enough for your purposes considering the relatively high complexity of the system?
Were restraints applied to the 3 helices (b,d,i) that unfolded in subsequent simulations or were they only applied to the other 6 helices?
In the section "Base Pairing Analysis of the Protein Data Bank," it is mentioned that the analysis likely over-estimates the problematic modeling of helices in cryo-EM derived RNA structures. Given that 15 Å is larger than the expected 8-11 Å (Pietal, M. J., 2012) distance for N1/N9 distance, would this not underestimate the number of problematic helices?
A detailed analysis of H-bonding within the 6 helices would be useful to get a mechanistic understanding of why the 3 helices (b,d,i) unfold and the other 6 don't—for example, it might be helpful to provide a comparison with the model generated from a single structure approach to know if the same H-bonds exist in it or not as the model forwarded by this paper.
ERMSD approach: we know relatively less about non-Watson Crick base pairing apart from Hoogsteen and G-U wobble base pairs since so much depends on the context (specific region of the structure, ion concentration, pH etc.) (we're still thinking about how to phrase this point). Balance between experimental measurements and MD sampled conformational states: lack of experimental validation when remodeling mismodeled regions for RNA - how do we know when specific interactions are too ideal?
How is variability between forcefield parameters for divalent metal ions or other ions addressed, especially considering that these are required for proper folding of RNA?
Dispersive interactions play a huge role in RNA integrity: how confident can we be about these parameters when comparing various nucleic acid specific force fields? Can these differences lead to unfolding of the 3 properly base paired helices?
With reference to the line on page 4: "The trajectories obtained with metainference simulations were then analyzed by back-calculating the corresponding averaged density map and comparing it to the experimental one", was the solvent density included in the back-calculated density?
What proportion of the CC_mask arises just due to fitting of rigid part of the structure and how much of it arises due to the improved fitting of the flexible regions from the 9 loops? Would it be possible to separate these contributions?
Minor Points:
Would it be possible to show Fig 2C as a violin or box plot rather than a bar plot? This would allow visualization of the distribution of CC_mask for different conditions with clusters representing conformational states that may agree with the experimental data.
Is local resolution considered while plotting the RMSF? For this purpose, it would be useful to have a local resolution estimation for the map to help the reader understand whether the unfolding of helices occurs simply because the specific regions were not well defined or if these regions are inherently flexible.
How effective is DeepFoldRNA in filling gaps in structure as it is normally known for sequence based structure prediction—specifically regarding the modeling of the 38-nucleotide gap?
For benchmarking purposes, applying this approach to other RNA structures or providing additional validation against independent experimental data (e.g., SAXS or chemical shifts from short RNA motifs) could further strengthen the conclusions.
Aside from the pre-1r (6ME0) and pre-2r (6MEC) states, Haack et al. (2019) found additional 3D classes that indicated disordered density in conserved regions, which they did not investigate further. They also suggested that the post-2r complex may be captured in one of the 3D classes that yielded a low-resolution 3D reconstruction. Is it possible that one of the structures found by the metainference method could have sampled one of the structures that were not investigated further or the post-2r complex?
Why is there a sudden fall in the number of helices formed in 32 replicas (Fig 2E)?
Sentence on page 4: "As a result, 32 replicas appeared to be the best compromise between agreement with experiment and computational cost." Has it been ensured that this is not a result of overfitting? One of the ways to do this can be to use half-map cross validation i.e. to check if the refined ensemble fits both the half maps equally.
Have the authors tried to change the weight for 1 μs reference simulation to improve the CC_mask?
Grammar/spelling mistakes Results, Test System and Preparation, 1st Paragraph: Change "run" to "ran" to retain past test throughout the paragraph in the sentence beginning "We then run 2.5 ns-long molecular dynamics (MD) simulations in explicit solvent…" Results, Ensemble Refinement, 1st Paragraph: "Within" is misspelled as "whitin" in the sentence beginning "Conversely, when restraining their helicity, whitin this single-replica refinement approach…" Results, Ensemble Refinement, 1st Paragraph: "Specifically, all the simulations attempted with a lower number of replicas were crashing reporting missing convergence in enforcing bond constraints, which indicates that the experimental and helical restraints were mutually incompatible."
- Tushar Raskar, Sonya Lee, James Fraser
Competing interests
The authors declare that they have no competing interests.
-
-