ECOD: Classification of domains in AFDB Swiss-Prot structure predictions
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The development of highly accurate protein structure prediction algorithms has led to an explosion of structural data, transforming our understanding of protein structure-function relationships across diverse organisms. Domain classifications such as the Evolutionary Classificatioxn of Protein Domains (ECOD) have incorporated these computational predictions alongside experimental structures to create comprehensive resources for the research community. The AlphaFold Database (AFDB) plays a unique role, providing millions of predicted structures that ECOD has systematically classified for human proteins, small pathogens, and reference proteomes. Here, we extend this classification framework to the Swiss-Prot/UniProtKB dataset, applying the Domain Parser for AlphaFold Models (DPAM) pipeline to classify domains from over 542,000 Swiss-Prot protein structure predictions, resulting in more than 1,032,000 classified domains. These domains span 3,493 ECOD topologies and display high assignment confidence (mean DPAM probability: 0.992), with extensive taxonomic and functional diversity. Notably, over 100,000 domains lack existing Pfam mappings, indicating novel evolutionary groups. These results significantly expand ECOD’s coverage into a functionally and taxonomically diverse protein space, anchoring high-confidence structure predictions in an evolutionary framework. By integrating Swiss-Prot predictions, we enhance the utility and interpretability of AlphaFold models and establish a foundation for future large-scale, functionally informed domain classifications.
Author Summary
Large-scale protein structure prediction using deep learning has revolutionized our ability to study protein families and infer biological function. However, connecting these predicted structures to well-understood evolutionary classifications remains challenging. In this work, we apply a domain parsing pipeline to classify over half a million AlphaFold-predicted Swiss-Prot proteins into evolutionary groups using ECOD, a structure-based domain classification system. This enables the systematic integration of structural predictions with functional annotation across a broad range of species. Our analysis reveals taxonomic and functional diversity, highlights domain clusters with no prior annotation, and expands ECOD coverage with high-confidence, evolutionarily meaningful predictions.