Optimized path planning surpasses human efficiency in cryo-EM imaging

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    Cryo-EM has become the dominant method in structural biochemistry, and making more efficient use of expensive microscope time is therefore of broad interest to academic and industrial users. The authors identify a bottleneck in cryoEM data collection, namely path optimization, and provide a valuable machine-learning model to overcome this bottleneck. The solid data presented suggests their model can replace a human operator to automate efficient data collection.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Cryo-electron microscopy (cryo-EM) represents a powerful technology for determining atomic models of biological macromolecules(Kühlbrandt, 2014). Despite this promise, human-guided cryo-EM data collection practices limit the impact of cryo-EM because of a path planning problem: cryo-EM datasets typically represent 2-5% of the total sample area. Here, we address this fundamental problem by formalizing cryo-EM data collection as a path planning optimization from low signal data. Within this framework, we incorporate reinforcement learning (RL) and deep regression to design an algorithm that uses distributed surveying of cryo-EM samples at low magnification to learn optimal cryo-EM data collection policies. Our algorithm - cryoRL - solves the problem of path planning on cryo-EM grids, allowing the algorithm to maximize data quality in a limited time without human intervention. A head-to-head comparison of cryoRL versus human subjects shows that cryoRL performs in the top 10% of test subjects, surpassing the majority of users in collecting high-quality images from the same sample. CryoRL establishes a general framework that will enable human-free cryo-EM data collection to increase the impact of cryo-EM across life sciences research.

Article activity feed

  1. eLife assessment

    Cryo-EM has become the dominant method in structural biochemistry, and making more efficient use of expensive microscope time is therefore of broad interest to academic and industrial users. The authors identify a bottleneck in cryoEM data collection, namely path optimization, and provide a valuable machine-learning model to overcome this bottleneck. The solid data presented suggests their model can replace a human operator to automate efficient data collection.

  2. Reviewer #1 (Public Review):

    Li, Fan et al. designed and evaluated a reinforcement learning (RL) based model to automate the planning of an optimal path for the collection of data for single particle cryo-electron microscopy. The goal was to maximize the quality of the data while minimizing the time required for acquisition. They use a deep regressor (DR) to rank all the targets in the grid based on their quality as predicted from low-magnification images. In the cryo-RL model, the prediction of the DR is modified by the result of a deep Q-network (DQN) driven by a reward based on the real-time assessment of newly acquired images and a penalty based on the time required to move the microscope stage to explore new areas of the specimen. The DR and the DQN are trained on a set of low-magnification preview images and their corresponding high-magnification recordings labeled based on the quality of fit of the contrast transfer function (the CTFMaxRes parameter). The distribution of quality of a series of non-ranked trajectories was used as a snowball baseline (SB). Importantly, all tests in this paper were performed on four datasets collected by an exhaustive sampling of the grid. Thus, all data is available to all protocols.

    When trained on a subset of squares from the same grid, DR+DQN outperforms DR which in turn outperforms SB. To improve transferability between specimens, both DR and DQN were trained with a large dataset sourced from a variety of samples and grid types imaged at the Cianfrocco Lab. Comparison of the performance of Cryo-RL (DR+DQN), DR, SB and of human subjects with different levels of expertise indicates shows that Cryo-RL yields the most high-resolution images in the shortest time. Further, the quality of the maps obtained from subsets of data selected using Cryo-RL is on par with the best datasets collected manually, although the latter showed marked variability.

    The demonstration that a low-magnification image contains sufficient information to predict the quality of high magnification counterpart is very encouraging. However, the authors show that this translates into a high-resolution structure for one of the four datasets. The use of CTFMaxRes, although prevalent in the field, is an incomplete estimator of the quality of micrographs. Even though both the DQN and DR can be trained using different criteria, it is not clear how strong a correlation between alternative parameters and the low-magnification images would be.

    This study concentrates on three "well-behaved" samples that tend to distribute evenly in the holes. The behavior of many macromolecules, e.g. orientation bias and stability, correlates with ice thickness in convoluted ways. Since ice thickness can vary drastically throughout a single hole, the overall appearance may not be sufficient to ensure a recording of the region where "good particles" concentrate. In these cases, sub-hole characterization from the low-magnification images will be necessary to target the appropriate areas. However, the feasibility of such an approach is yet to be determined. All that said, this is a timely publication that is likely to have a positive impact on the efficiency of data collection for cryo-EM.

  3. Reviewer #2 (Public Review):

    The authors identify a bottleneck in cryoEM data collection, namely path optimization, and provide a method and software to attempt to solve this problem, then evaluate the solution based on several metrics including full downstream processing. In addition, the authors report on a cryoEM data collection simulator, which could be used to more efficiently train users and microscope operators if released. I have experience with cryo-EM and applications of machine learning to cryoEM. In my opinion, the results are convincing insofar as showing that the algorithm employed by cryoRL performs at least as well as humans and with greater consistency than humans. I think combining cryoRL with existing square & hole targeting algorithms and collection software has the potential to result in a complete and efficient automated solution for high-resolution cryoEM data collection.

  4. Reviewer #3 (Public Review):

    The data presented suggest that their algorithm can replace a human operator, which is a strong enough reason to publish and disseminate the technology. At the same time, aspects of the methods and results could benefit from a clearer explication. For example, the reported R^2 values for their model's performance are less than 0.5, (0.191, 0.2, 0.345, 0.467). I take this to mean the model's predictions are better than the mean value but that it will probably not generalize well for data it hasn't seen yet. Please comment.

    Did the authors partition their data into a training set, a validation set, and a test set? From the manuscript, it wasn't obvious to me they withheld a test set (a set of data never seen by the model, which they used to evaluate the performance of the model selected based on the validation set). From Extended Data Figures 1 and 2, I inferred that the number of samples in the confusion matrix matches the validation size (n=2341). So, are they reporting validation results and not test results? Please explain.