Utilizing artificial intelligence system to build the digital structural proteome of reef-building corals

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Reef-building corals play an important role in the marine ecosystem, and analyzing their proteomes from a structural perspective will exert positive effects on exploring their biology. Here we integrated mass spectrometry with newly published ColabFold to obtain digital structural proteomes of dominant reef-building corals. 8,382 proteins co-expressed in A. muricata, M. foliosa and P. verrucosa were identified, then 8,166 of them got predicted structures after around 4,060 GPU hours of computation. The resulting dataset covers 83.6% of residues with a confident prediction, while 25.9% have very high confidence. Our work provides insight-worthy predictions for coral research, confirms the reliability of ColabFold in practice, and is expected to be a reference case in the impending high-throughput era of structural proteomics.

Article activity feed

  1. Abstract

    Reviewer1: Jianyi Yang

    The authors present the predicted structures for the proteome of the reef-building corals. 8382 protein sequences were obtained by experiments, which are fed into ColabFold for structure modeling, generating 8166 structure models. Overall, this is a valuable study toward the understanding of the reefbuilding coral. Here are a few comments for possible improvement.

    1. It becomes trivial for proteome-wide structure predictions nowadays with AlphaFold2 and other methods. I think the major contribution of the current study is the determination of the proteome sequences rather than the structure prediction. Thus, I would encourage the authors to spend more effort in analyzing the sequences, for example, how the sequences cover the Pfam families, how redundant the sequences are, how much they overlap with the sequences in UniProt, etc.
    2. It may be meaningful to compare the predicted structure models to the SCOP or CATH database to see the fold distribution and if there is any new fold.
    3. What happened to the ~200 proteins that ColabFold failed to work?
    4. I suggest adding a browse function to the server for browsing the data.

    Reviewer2: Brendan Robert E. Ansell

    Zhu and colleagues report the generation of predicted protein structures via alpha-fold, for three coral species: A muricate, M foliosa and P verrucosa. Mass-spec analysis of the proteome of the three species is also performed. The authors describe a handful of structures that appear to be orthologues across the species and may have functions as pore-forming toxins, in calcium deposition and host-symbiont interactions. The generated protein structures will be of use to the scientific community and the web server is quite good. Major comments: Please ensure that the entire structure repository is available for unrestricted download as per http://corals.bmeonline.cn/prot/release.php Incorrect use of 'co-expression'. Assume the authors mean protein orthologues (i.e., homologues across species). Please replace with 'homologous proteins' throughout including in http://corals.bmeonline.cn/prot/release.php The link from 'CoralBioinfo' gives a 404 error: http://corals.bmeonline.cn/index.php In http://corals.bmeonline.cn/blast/, please include a link back to http://corals.bmeonline.cn/prot/ Although the manuscript lacks bioinformatic analysis of the structural proteome, this is not required for the data note category but would enhance the value of the publication if provided. In terms of validation, there is a technical control for the alphafold instance that this project used, which the authors should include. Specifically, please report the RMSD between structures predicted in this work with the published alphafold structures for the same proteins Acropora muricata ( 20 proteins), Montipora foliosa (8 proteins) and Pocillopora verrucosa (70 proteins), available at e.g. https://alphafold.ebi.ac.uk/search/text/Montipora%20foliosa%20?organismScientificName=Montipora %20foliosa Please detail in methods how the mass spec data relates to improving the genome or proteome annotation of each species. How was the mass spec data used? I presume it was used to identify 3-way orthologues between the species, and producing the "8,382 co-expressed proteins" that were selected for structural prediction. The data dump would be stronger if the mass spec proteomics data was also made available. What proportion of the structural proteome has mass-spectral support? Please include a supplementary text file containing the key features of each predicted protein e.g. % high confidence structure, gene id, interpro domain annotations , and top blast homologues. The long proteins could be split by domain to provide some structural information. To boost the value of this data, the authors might also consider predicting the coral symbiont proteomes followed by integrative analysis of host and symbiont proteomes to predict interacting partners. What are the domain and sequence features of the low and very-low confidence predictions? Is the reference genome available for any species? What is the completeness and content. How does the mass spec and structural data improve the genome annotation and vice versa? At present large parts of the discussion are irrelevant. Comments about covid-19 and the role of bioinformaticians are outside the scope of a research report. Minor comments: Comment on whether toxicity is reported for these coral species. Use full genus names on first use Proofreading of grammar required throughout, and elimination of non-scientific phrasing. Drop irrelevant arguments regarding COVID 19 and the call to arms for bioinformaticians.