Revisiting graph-based approaches for small protein analysis: Insights from anti-CRISPR protein networks

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Bacteriophage anti-CRISPR (Acr) proteins have the potential to reduce off-target effects of genome editing by inactivating the CRISPR-Cas bacterial defense. The current challenge lays in their functional annotation, as Acr proteins have high structural diversity and low sequence similarity, thus rendering common homology-based methods unfit. Recent solutions use deep learning models such as graph convolutional networks that take protein networks as the data input. In an effort to understand whether these new solutions are fit for niche, sparsely annotated proteins, we focus on 3 Acr proteins (AcrIF1, AcrIIA1, and AcrVIA1) as a case study. For each, we create protein contact networks (PCNs) and residue interaction graphs (RIGs) based on existing network theory and methodology. We characterize and analyze these protein networks by comparing how each network architecture affects values of small-worldliness. We reexamine a previous method that focused on using node degree, closeness centralities, and residue solvent accessibility to predict functional residues within a protein via a Jackknife technique. We discuss the implications of the construction of these networks based on how the structure information is acquired. We demonstrate that functional residues within small proteins cannot be reliably predicted with the Jackknife technique, even when provided with a curated dataset containing representative standardized values for degree and closeness centrality. We show that functional residues within these small proteins have low degrees within both PCNs and RIGs, thus making them susceptible to the known degree bias towards high degree nodes present in using graph convolutional networks. We discuss how understanding the data can be used to further improve deep learning approaches for small proteins.

Author summary

A bacteria’s CRISPR-Cas defense system acts as security guard against viruses like bacteriophages. By storing pieces of viral DNA as records, it can recognize and defend the bacteria against threats. Scientists have adapted this effective record keeping process to perform targeted genome editing. Some bacteriophages have genes that encode for anti-CRISPR (Acr) proteins. The proteins act as a criminal accomplice to the viral DNA, sneaking them in past the bacteria’s security in a variety of ways. There has been increased interest in using these Acr proteins to limit unintended or off-target effects of targeted genome editing. However, Acr proteins are difficult to identify. We changed parts of a previous method that used graph representations of protein structure to determine important amino acids that help that protein perform its function. We applied these methods to three Acr proteins to determine whether we observed similar patterns in these graphs. We explain how features of these graph representations of protein structures can affect graph neural networks that use them as input to learn more about proteins.

Article activity feed