3D-Beacons: Decreasing the gap between protein sequences and structures through a federated network of protein structure data resources

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

While scientists can often infer the biological function of proteins from their 3-dimensional quaternary structures, the gap between the number of known protein sequences and their experimentally determined structures keeps increasing. A potential solution to this problem is presented by ever more sophisticated computational protein modelling approaches. While often powerful on their own, most methods have strengths and weaknesses. Therefore, it benefits researchers to examine models from various model providers and perform comparative analysis to identify what models can best address their specific use cases. To make data from a large array of model providers more easily accessible to the broader scientific community, we established 3D-Beacons, a collaborative initiative to create a federated network with unified data access mechanisms. The 3D-Beacons Network allows researchers to collate coordinate files and metadata for experimentally determined and theoretical protein models from state-of-the-art and specialist model providers and also from the Protein Data Bank.

Article activity feed

  1. Abstract

    Reviewer1: Lim Heo

    In this manuscript, authors described a new platform, 3D-Beacons, which is an interface for accessing multiple sources of computational protein models (e.g., AlphaFold DB, SWISS-MODEL) and experimentally determined structures. As the number of protein sequences increases much faster than the growth of experimental structure database (e.g., PDB), computational protein structure models are great alternatives for proteins that do not have experimentally determined structures. Nowadays, many accurate protein models have become available thanks to the progress in template-based modeling techniques for decades and recent advances in de novo protein structure prediction methods using machine-learning approaches. However, those model sources were scattered at their own databases, so there has been difficulties in accessing these models. Thus, in my opinion, the development of a new database or platform, 3D-Beacons, for accessing various computational models is a great movement in the structural biology field. The manuscript well described the description of the platform and some technical details. I have a few minor comments on this work.1. I recently noticed that RCSB PDB also made it possible to search computational protein models by extending its web interface. The database included ~1 million models from AlphaFold DB and ~1,100 models from ModelArchive, which are main sources of this work as well and are maintained by some of the authors of this work. Even though the number of models and the diversity of the sources accessible via the RCSB PDB interface are fewer than this work, I think the purpose of both works are similar. As there are some overlaps between this work and the RCSB PDB interface in terms of data providers (and authors), what is the significance of this work compared to the RCSB PDB interface?2. Most computational models rely on a few data providers, AlphaFold DB, SWISS-MODEL Repository, and AlphaFill (for ligands). In my opinion, it would be better to make the platform richer by recruiting more diverse data providers with different points of view (e.g., conformational ensembles) or different modeling approaches (e.g., machine learning-based approaches with pre-trained protein language models such as OmegaFold). Is there any plan for such progress or promotion of the platform?3. It would be better to have a guide of model selection if there are multiple searched models for an Uniprot ID. Alternatively, providing universal quality assessment scores for models would be an option (by additional data provider). Currently, pLDDT scores are provided, but they are difficult to compare between modeling methods as they were trained independently for each method.4. I was able to search on the 3D-Beacons web page a few days ago. However, I could not at the moment of writing these review comments (Sept. 13, 6 p.m. in EDT).

    Reviewer2: Carlos Rodrigues

    This manuscript describes in detail the 3D-Beacons platform/initiative, which aims to facilitate access to 3D data as well as meta-information about experimentally determined and computationally predicted protein structure. This resource is very valuable for the broader scientific community in a time where the number of protein structure data available rapidly increases an many structures may be available for the same protein.A minor correction is required on page 7, where the authors describe 4 different types of protein structures: Experimentally determined, Template-based, Ab-initio anc Conformational Ensembles. On many examples available on the website (e.g. https://www.ebi.ac.uk/pdbe/pdbekb/3dbeacons/search/P15056), there is one extra category which is structures derived from "Deep learning" methods. I am assuming this comprises a sub-set of Ab-initio structures, which the authors decided to keep as a separate category after submitting this study for publication. The main text should be updated to reflect this change as well as Figure 4.