Automated filtering of particle images in single particle cryoEM

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Continued exponential growth in the number of structures resolved by single particle cryoEM, as seen in the last decade, requires ever more effective data analysis workflows. Datasets are rarely homogeneous, demanding a multistep procedure for discarding outliers. Since individual particles are very noisy, either 2D or 3D averages are normally used for discrimination. This becomes challenging when the 2D classes themselves are heterogeneous, leading to selection of contaminants or discarding useful rare views/poses. The 3D model-based discrimination requires trustworthy 3D maps and a correct assignment of Euler angles, which in turn depends on the quality of the initial data and might not be available at the very early stages of the analysis. We propose a novel deep-learning approach for improving quality of single particle datasets. The two-stage procedure consists of denoising single particle images using Variational AutoEncoder framework followed by particle quality filtering based on the score inferred for every particle by Domain Adaptation Neural Network trained on a large data set of categorised 2D averages. This approach allows an automated scoring of noisy raw images using data patterns learned from the high signal-to-noise ratio, externally derived 2D classes. Consequently, a higher quality data set enters computationally expensive steps of the data analysis, reducing the need for protracted and expensive calculations. Importantly, our method does not require any prior knowledge about the data or existence of a 3D model, making it universally applicable. Tests on publicly available datasets demonstrated that our approach largely outperformed 2D class-based particle discrimination. Smaller subsets of the top-scoring particles selected with our method were required to obtain the author-reported 3D model resolution. When applied to the user data in the automated on-the-fly data processing pipeline, the method rescued 30% of cases, which otherwise would not reach confidence threshold required for making decision to proceed to the 3D model refinement. It also led to general improvements in the quality of the 3D models for many datasets which were selected for the high-resolution processing.

Article activity feed