Metric Ion Classification (MIC): A deep learning tool for assigning ions and waters in cryo-EM and x-ray crystallography structures

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

At sufficiently high resolution, x-ray crystallography and cryogenic electron microscopy are capable of resolving small spherical map features corresponding to either water or ions. Correct classification of these sites provides crucial insight for understanding structure and function as well as guiding downstream design tasks, including structure-based drug discovery and de novo biomolecule design. However, direct identification of these sites from experimental data can prove extremely challenging, and existing empirical approaches leveraging the local environment can only characterize limited ion types. We present a novel representation of chemical environments using interaction fingerprints and develop a machine-learning model to predict the identity of input water and ion sites. We validate the method, named Metric Ion Classification (MIC), on a wide variety of biomolecular examples to demonstrate its utility, identifying many probable mismodeled ions deposited in the PDB. Finally, we collect all steps of this approach into an easy-to-use open-source package that can integrate with existing structure determination pipelines.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/10944563.

    This paper developed an innovative strategy to classify modeled water and/or ions in high-resolution X-ray crystallography or cryo-EM structures. Based on local geometry fingerprints are used as input for a deep metric learn model to predict the classification of placed ions and waters within input structures based on their local chemical environments. 

    While this tools does not present significant improvements over existing tools such as undowser and checkmymetal, This tool presents a new method for using geometric fingerprints combined with deep metric learning and has demonstrated the ability to extend water/ion classification to high-resolution cryoEM and RNA structures along with the detection of halides. 

    Major:

    1) At the beginning of the results section and introduction, please clarify that this tool is for checking already modeled waters and/or ions. 

    2) Please clarify what you mean by 'we remove the initial features of both the density itself'. Is this referring to experimental density or data point density?

    3) Please clarify your re-refinement schema as referred to in the result section 'x-ray structures were re-refined with the alternative density and Fo-Fc maps were inspected in both cases'. Please put the entire refinement protocol in the methods section. 

    4) Please provide rationale for the -3 to 3 score when re-assessing potentially incorrectly labeled positions. What goes into each number? 'We assigned each structure a score between -3 and 3, with increasingly positive scores denoting more support for the MIC prediction and increasingly negative scores support for the original label.' 

    5) For splitting training/testing, as only individual sites were considered, did you also examine if the training/testing split was even in terms of resolution, R-factors, or date of deposition, as all of these would impact the goodness of fit of many waters/ions.  

    Minor:

    1) There is a typo ('ues') on page 4 of the intro.

    2) In the results section, please provide information on the number of PDBs you use for training and their characteristics (selection of resolution, deposition year, re-refined, ect). Likewise, what was the size and characteristics of your testing set? 

    3) It would be of great benefit to the community if the authors deposited their updated ion/water classifications that they manually reviewed in Zenodo or somewhere else.

    4) It would be interesting, but likely outside the scope of this paper, to understand how incorrectly modeled water and/or ions (i.e., if they were not placed at the center of the density peak) impact MIC or other picking algorithms. 

    5) Please provide information on which PDBs were chosen for CMM comparison. 

    Competing interests

    The reviewer (Stephanie Wankowicz) is at the same institution as the first author and knows her personally.