A systematic evaluation of the language-of-viral-escape model using multiple machine learning frameworks

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Predicting the evolutionary patterns of emerging and endemic viruses is key for mitigating their spread in host populations. In particular, it is critical to rapidly identify mutations with the potential for immune escape or increased disease burden (variants of concern). Knowing which circulating mutations are such variants of concern can inform treatment or mitigation strategies such as alternative vaccines or targeted social distancing. A recent study proposed that variants of concern can be identified using two quantities extracted from protein language models, grammaticality and semantic change. These quantities are defined in analogy to concepts from natural language processing. Grammaticality is intended to be a measure of whether a variant viral protein is viable, and semantic change is intended to be a measure of potential for immune escape. Here, we systematically test this hypothesis, taking advantage of several high-throughput datasets that have become available, and also testing additional machine learning models for calculating the grammaticality metric. We find that grammaticality can be a measure of protein viability, though the more traditional metric ΔΔ G appears to be more effective. By contrast, we do not find compelling evidence that semantic change is a useful tool for identifying immune escape mutations.

Article activity feed