Automated systematic evaluation of cryo-EM specimens with SmartScope

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    Bouvette et al. describe a new software for fully automated cryo-EM sample screening and data acquisition, making use of deep-learning-based algorithms for the detection of regions and objects of interest. This is the first example of software for fully automated grid screening, which is of great interest to the cryo-EM community, to free skilled researchers and engineers from a serious of tedious tasks, so that they can devote more time to method development or finding answers to interesting biological and medical questions.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article

Abstract

Finding the conditions to stabilize a macromolecular target for imaging remains the most critical barrier to determining its structure by cryo-electron microscopy (cryo-EM). While automation has significantly increased the speed of data collection, specimens are still screened manually, a laborious and subjective task that often determines the success of a project. Here, we present SmartScope, the first framework to streamline, standardize, and automate specimen evaluation in cryo-EM. SmartScope employs deep-learning-based object detection to identify and classify features suitable for imaging, allowing it to perform thorough specimen screening in a fully automated manner. A web interface provides remote control over the automated operation of the microscope in real time and access to images and annotation tools. Manual annotations can be used to re-train the feature recognition models, leading to improvements in performance. Our automated tool for systematic evaluation of specimens streamlines structure determination and lowers the barrier of adoption for cryo-EM.

Article activity feed

  1. Evaluation Summary:

    Bouvette et al. describe a new software for fully automated cryo-EM sample screening and data acquisition, making use of deep-learning-based algorithms for the detection of regions and objects of interest. This is the first example of software for fully automated grid screening, which is of great interest to the cryo-EM community, to free skilled researchers and engineers from a serious of tedious tasks, so that they can devote more time to method development or finding answers to interesting biological and medical questions.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their name with the authors.)

  2. Reviewer #1 (Public Review):

    This paper describes a new software tool: smartScope, for automated screening of cryo-EM grids. SmartScope can also perform automated data collection on suitable grids, including using beam-image shifts and tilted stage geometries. SmartScope uses deep-learning approaches for the selection of squares and holes of interest. The description of the software given in the paper is very promising, and as the code has not yet been made available, I cannot comment on its modularity, ease of installation, or general usability.

    The convolutional neural networks for square and hole detection were trained on relatively few examples, and supposedly all from the same microscope. How easy would it be for users to re-train these detectors for their own purposes? Could a description of that be added to the paper/documentation?

    The introduction makes the same point over multiple pages, and could probably be easily cut in half length-wise. This will force the authors to formulate more succinctly, and thereby more clearly. Hopefully, this would then eliminate wooly or incorrect statements like: "the beginning of each new project is fraught with uncertainty", "[The number of combinations] grows exponentially with the inclusion of each parameter" (it doesn't!), "would be an invaluable tool".

    Also, the first half of the Abstract needs some rewriting. It focuses first on grid optimisation, which is not what smartScope is about. SmartScope is about grid screening. Just say that and save some lines in the Abstract too.

    Lines 257-261 describe some setup in serialEM. Perhaps because I am not familiar with that software myself, but I had no clue what those lines meant. Perhaps some example setup files could be provided as supplementary information?

    For the DNA polymerase data set: mention in the Results section how long the entire data collection (or 4.3k images) took. Also, the sharpened map in the validation file has a very weird distribution of greyscale values. Its inclusion of volume with varying greyscale is basically a step function, indicating that this is more or less a binary density map. I suspect that this is a result of the DeepEMhancer procedure. But given that the scattering potential of proteins is not binary, I wonder how such a map can be justified. Also, the FSC curve shown in the paper does not mention any masks, but the reported resolution of 3.4A is higher than the unmasked resolution calculated by the PDB: 3.7A. Why is the DeepEMhancer software used here? Is it hiding a slightly suboptimal map? As map quality is not what this paper is about, perhaps it would suffice to show the original map alone?

  3. Reviewer #2 (Public Review):

    The work "Automated systematic evaluation of cryo-EM specimens with SmartScope" by Bouvette et al., describes a new software for fully automated cryo-EM sample screening and data acquisition, making use of deep-learning-based algorithms for the detection of regions and objects of interest. The authors fully succeeded in providing the first piece of software for fully automated cryo-EM grid screening. Being developed within a cryo-EM facility, SmartScope seems to address, at least on paper, all the questions that typically arise during sample screening (e.g. ice-thickness evaluation, sample distribution on ice versus carbon support, molecules' preferential orientation on ice, and report on usable good areas for data collection) and on the most common type of specimen types (regular holey grids with carbon or gold support and even negative stain specimens). Moreover, it offers the possibility to pipe this information in a follow-up data acquisition step. Another positive aspect is the possibility of intervention at any stage. The authors have described very well each optimisation step in their pipeline, the figures are very clear and the software seems easy to use. Of course, it will have to pass a test for robustness when the community will start making use of it.

    The fact that SmartScope is made open access is very important to foster the collaborative development of automatic grid screening pipelines. The network infrastructure required seems to be simple and light enough to be adapted to both small and big cryo-EM centres.

  4. Reviewer #3 (Public Review):

    The authors present a modular computational workflow for automated sample screening and collection of cryo-EM data and demonstrate its use for screening and 3D structure determination of human mitochondrial DNA polymerase as a test sample. Despite major advances in automation of microscope operation, optimising and screening sample conditions for the acquisition of high-quality data is still a laborious task that involves human input to navigate low-, medium- and high-magnification images to identify and select specimen areas amenable to high-resolution structure determination; and subjective tuning of parameters that can result in inefficient use of high-end cryo-TEM equipment. Fully automated methods for screening and data collection are therefore needed to meet the increasing demand for access and throughput of cryo-EM. Utilising deep-learning-based object detection algorithms, the authors show that their pre-trained models can effectively detect, classify, and rank regions (grid squares and holes) of interest based on established criteria such as contamination, support film integrity, and ice thickness. A challenge for any such method is the scarcity of annotated data reflecting the broad variety across the wide range of image and sample conditions in cryo-EM, and that selection of the "best" areas may vary by particle and sample preparation conditions. To mitigate this risk, the authors provide a web interface that allows re-training of the feature models and integrates on-the-fly assessment of data quality and adjustment of data collection parameters. As such, the presented pipeline and related approaches can become a useful addition to existing automation software for cryo-EM data collection, in multi-user environments such as cryo-EM facilities. Such approaches will best strive if software and models are openly available to the cryo-EM community so that annotated data can be added or customised and the quality of the prediction methods can improve over time.

  5. Author Response

    Reviewer #1 (Public Review):

    This paper describes a new software tool: smartScope, for automated screening of cryo-EM grids. SmartScope can also perform automated data collection on suitable grids, including using beam-image shifts and tilted stage geometries. SmartScope uses deep-learning approaches for the selection of squares and holes of interest. The description of the software given in the paper is very promising, and as the code has not yet been made available, I cannot comment on its modularity, ease of installation, or general usability.

    The convolutional neural networks for square and hole detection were trained on relatively few examples, and supposedly all from the same microscope. How easy would it be for users to re-train these detectors for their own purposes? Could a description of that be added to the paper/documentation?

    Training was done on a mix of images coming from Ceta and K2 detectors. We added more details about the nature of the training data in the Materials and Methods section. Users will be able to re-train the model using the code provided.

    The introduction makes the same point over multiple pages, and could probably be easily cut in half length-wise. This will force the authors to formulate more succinctly, and thereby more clearly. Hopefully, this would then eliminate wooly or incorrect statements like: "the beginning of each new project is fraught with uncertainty", "[The number of combinations] grows exponentially with the inclusion of each parameter" (it doesn't!), "would be an invaluable tool".

    We carefully drafted the introduction to appeal to the broad audience of eLife while emphasizing the significance of our work. We have edited the text to make it more concise without missing the important points while reducing its length by over one third.

    Also, the first half of the Abstract needs some rewriting. It focuses first on grid optimisation, which is not what smartScope is about. SmartScope is about grid screening. Just say that and save some lines in the Abstract too.

    While SmartScope is not a tool for grid optimization, it provides direct feedback on grid quality which is a critical component of cryoEM specimen optimization. To clarify this point, we edited the abstract to highlight the screening aspect of SmartScope and we shortened it from 197 to 151 words.

    Lines 257-261 describe some setup in serialEM. Perhaps because I am not familiar with that software myself, but I had no clue what those lines meant. Perhaps some example setup files could be provided as supplementary information?

    Since setup files are tailored to specific hardware combinations, a settings file itself would not be beneficial. However, we added a new supplementary table with examples of 2 tested microscope configurations. As with any software, we expect SmartScope to evolve as new users report bugs and request new features. We also hope that a community of open-source developers will help us move it forward. For that reason, users are encouraged to refer to the “live” table on the documentation website of SmartScope where additional hardware combinations will be posted as the software is tested on new systems.

    For the DNA polymerase data set: mention in the Results section how long the entire data collection (or 4.3k images) took. Also, the sharpened map in the validation file has a very weird distribution of greyscale values. Its inclusion of volume with varying greyscale is basically a step function, indicating that this is more or less a binary density map. I suspect that this is a result of the DeepEMhancer procedure. But given that the scattering potential of proteins is not binary, I wonder how such a map can be justified. Also, the FSC curve shown in the paper does not mention any masks, but the reported resolution of 3.4A is higher than the unmasked resolution calculated by the PDB: 3.7A. Why is the DeepEMhancer software used here? Is it hiding a slightly suboptimal map? As map quality is not what this paper is about, perhaps it would suffice to show the original map alone?

    Thank you for pointing out the need of more clarity regarding this point. DeepEMhancer seems to apply a more conservative “sharpening” in the lower resolution areas of the map leading to more pleasing images. Hence, we used the corrected map for display purposes only. The raw map and half-maps are provided via EMDB. The FSC curve and overall resolution values reported in the paper were obtained using a shape mask produced using standard procedures implemented in CryoSparc. FSC curves and local-resolution map (Resmap) were calculated using the half-maps produced during refinement prior to sharpening We have now added the requested details to the figure legend. We also added the unmasked resolution in table 1 together with information about data collection throughput.