Assessing scoring metrics for AlphaFold2 and AlphaFold3 protein complex predictions

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

1.

Recent breakthroughs in AI-driven protein structure prediction have revolutionized structural biology, unlocking new possibilities to model complex biomolecular interactions. We evaluated widely-used scoring metrics for assessing models predicted by ColabFold with templates, ColabFold without templates, and AlphaFold3. We benchmarked the optimal cutoffs for these assessment scores using a set of 223 heterodimeric, high-resolution protein structures and their predictions. Our results show that ColabFold with templates and AlphaFold3 perform similarly and both outperform ColabFold without templates. However, the assessment scores perform best on ColabFold without templates. Furthermore, interface-specific scores are more reliable for evaluating protein complex predictions compared to the corresponding global scores. Notably, ipTM and model confidence achieve the best discrimination between correct and incorrect predictions. Based on our results, we developed a weighted combined score, C2Qscore, to improve model quality assessment. We used C2Qscore to analyse dimers from large assemblies solved by cryoEM, revealing potential limitations of the existing metrics when multiple configurations of heterodimers are possible. This study provides insights into the strengths and weaknesses of current scores and offers guidance for improving protein complex model assessment under realistic use case conditions. C2Qscore has been integrated as a tool into our ChimeraX plug-in PICKLUSTER v.2.0 and is also available as a command-line tool on https://gitlab.com/topf-lab/c2qscore .

Impact of this work

Many essential cellular functions rely on protein complexes, which are now predominantly predicted using AlphaFold by both experts and non-experts. This study systematically evaluates the performance of multiple widely used scoring metrics for distinguishing accurate from poor predictions. The new C2Qscore developed in this study improves the reliability of AlphaFold model assessments, enabling a more consistent and accessible evaluation of protein structures. These advancements support downstream applications in biomedicine, drug discovery, and computational protein design.

Article activity feed