Protein and genomic language models chart a vast landscape of antiphage defenses
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The bacterial pangenome encodes an immense array of antiphage systems, yet much of their diversity remains uncharted. In this study, we developed language models to predict novel antiphage proteins in two ways: first via fine-tuning ESM2, a protein language model capable of detecting distant homology to known defense proteins, second via a genomic language model with ALBERT architecture which predicts defensive function based on genomic context. We demonstrate that applying these approaches to Actinomycetota - a phylum largely unexplored for antiphage defenses, can accurately predict previously unknown functional defense mechanisms, leading to the discovery and experimental validation of six defense systems with novel antiphage proteins. Analysis of over 30,000 bacterial genomes predicted more than 45,000 uncharacterized protein families potentially involved in antiphage defense, underscoring the vast, untapped diversity of these systems.