The Viral AlphaFold Database of monomers and homodimers reveals conserved protein folds in viruses of bacteria, archaea, and eukaryotes

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Viruses are among the most abundant and genetically diverse entities on Earth, yet the functions and evolutionary origins of most viral proteins remain poorly understood. Their rapid evolution often obscures evolutionary relationships, making it difficult to assign functions using sequence-based methods alone. Although conservation of protein fold can reveal deep homologies undetectable by sequence comparison, viral proteins remain vastly underrepresented in structural databases, limiting our ability to explore them at the structural level. Here, we address this gap by clustering all unique viral sequences from the NCBI RefSeq database and predicting the structures of ∼27,000 representative proteins using AlphaFold2, creating a large-scale viral structural resource, the Viral AlphaFold Database (VAD). We uncover ∼10,000 proteins belonging to clusters that share folds across viruses infecting bacteria, archaea, and eukaryotes, revealing shared protein folds across diverse host-infecting viruses. We also predict oligomeric states using AlphaFold2-based homodimer modelling, alongside structural comparisons to the Protein Data Bank, providing valuable new data on the potential for viral proteins to oligomerise. We further reveal that large regions of the viral protein universe remain functionally dark and report the discovery and experimental validation of a previously uncharacterised antiviral toxin-antitoxin (TA) system. VAD is a resource that provides a foundation for exploring viral structure–function relationships, including ancient folds that shape viral interactions across all life. Predicted structures used in this study are available at data-sharing.atkinson-lab.com/vad/.

Article activity feed

  1. data-sharing.atkinson-lab.com/vad/.

    I really appreciate that you have made the structures available and also appreciate the detailed metadata in the supp table (associated host, 'brightness' etc.). It would be great if you could set up a web portal to host all of this to make it easier for others to view and reuse your dataset (download structures per host or per cluster, etc.)

  2. Recent viral structure databases, such as the BFVD [18] and ViralZone [10] projects, have addressed this gap to a large extent. However, these resources rely on less accurate methods than the reference AlphaFold2 implementation and are limited to monomeric structural predictions.

    https://www.biorxiv.org/content/10.1101/2024.12.19.629443v1.full

    It might be nice to also compare your study to the Viro3D work, which combines AlphaFold2-ColabFold and ESMFold to predict structures from animal-infecting viruses