Systematic evaluation of peptide property predictors with explainable AI technique SHAP

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deep learning models are often characterized as black boxes because their layers of various mathematical transformations and activation functions are practically uninterpretable and have little meaning to the user. However, explainable AI methods exist to attribute model output predictions to its inputs. Shapley additive explanations (SHAP) is one such method that directly quantifies the inputs’ contributions and qualitatively addresses the question of why a model makes a particular prediction. SHAP generally is hampered by its computational cost, which scales very poorly with large inputs. Therefore peptide property predictors that take as input amino acid sequences, roughly between the lengths of 10-30 amino acids, represent ideal systems for applying SHAP. In applying SHAP to models that predict retention time, collisional cross section, peptide flyability, and fragment intensity, we obtain the relative influence that each amino acid has on predicting each property, and furthermore can rationalize the values from the perspective of the amino acids’ chemistry. Simply correlating the average shapley values per amino acid type over an entire validation dataset has yielded high correlation to published amino acid indices that are highly related to the property being predicted. For instance, our average shapley values for retention time had a 0.973 Pearson correlation with experimentally measured amino acid retention indices at pH 2. In applying SHAP on the Prosit fragment intensity prediction model, there is strong agreement with the mobile proton model, specifically demonstrating the effect of basic amino acids, the proline effect in charge-remote fragmentation, and positively charged amino acids in charge-directed fragmentation. We also use SHAP in a targeted experiment to demonstrate the Pathways in competition behavior of the model, and reveal a very discrete decision process based on basic residues and the fragment charge. While SHAP in this work was applied to models of well understood properties/systems, there is great potential to explain less studied areas of peptide chemistry to provide insights into their mechanisms.

Article activity feed