MassiveFold Data for CASP16 ‐ CAPRI : A Systematic Massive Sampling Experiment

Nessim Raouraoua
Marc F. Lensink
Guillaume Brysbaert

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Massive sampling with AlphaFold2 has become a widely used approach in protein structure prediction. Here we present the MassiveFold CASP16‐CAPRI dataset, a systematic, large‐scale sampling of both monomeric and multimeric protein targets. By exploiting maximal parallelization, we produced up to 8040 models per target and shared them with the community for collaborative selection and scoring. This collective effort minimizes redundant computation and environmental impact, while granting resource‐limited groups ‐ especially those focused on scoring ‐ access to high quality structures. In our analysis, we define an interface‐difficulty classification based on DockQ metrics, showing that massive sampling yields the greatest gains on most of the challenging interfaces. Crucially, this classification can be predicted from the median ipTM scores of a routine AF2 run, enabling users to selectively deploy massive sampling only when it is most needed. Combined with a reduction of the massive sampling from 8040 to 2475 predictions, such targeted strategies dramatically cut computation time and resource use with minimal loss of accuracy. Finally, we underscore the persistent challenge of choosing optimal models from massive sampling datasets, emphasizing the need for more robust scoring methods. The MassiveFold datasets, together with AlphaFold ranking scores and CASP and CAPRI assessment metrics, are publicly available at https://github.com/GBLille/CASP16‐CAPRI_MassiveFold_Data to accelerate further progress in protein structure prediction and assembly modeling.

Version published to 10.1002/prot.70040
Aug 28, 2025
Version published to 10.1101/2025.05.26.653955 on bioRxiv
May 27, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed