Predicting Protein Crystal Solvent Content from Patterson Maps Using Machine Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Estimating the solvent content of protein crystals is fundamental to identifying the correct symmetry and phasing of the unit cell. Typically, the number of molecules in the asymmetric unit is not known and probabilistic methods are used based on statistics derived from the Protein Data Bank (PDB). These methods tend to predict the number of molecules incorrectly in around 20% of cases, which can significantly impede the structure solution pipeline. Here multiple machine learning approaches are investigated to predict solvent content using Patterson Maps. Several architectures are shown to give a significant improvement over current approaches, with prediction errors being reduced by over 50%. In addition, the potential of embedded representations of Patterson Maps for clustering is demonstrated, which could lead to new approaches for identifying similar structures when processing novel data.