AF-CALVADOS: AlphaFold-guided simulations of multi-domain proteins at the proteome level

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deep-learning methods have transformed our ability to predict the three-dimensional structures of folded proteins from sequence, and coarse-grained simulations have made it possible to study intrinsically disordered proteins at the proteome scale. More than half of human proteins, however, contain mixtures of disordered regions and one or more folded domains, and the biological function of these multi-domain proteins depends on the interplay between the folded and disordered regions. Here, we developed AF-CALVADOS, a coarse-grained simulation model that is informed by AlphaFold to model the dynamics of intrinsically disordered proteins and multi-domain proteins containing mixtures of folded and disordered regions. AF-CALVADOS leverages information from AlphaFold 2 to model folded regions that we then integrate with the coarse-grained CALVADOS model. Our automated framework makes it possible to perform simulations of any soluble folded or disordered protein without manually defined folded regions, enabling scaling to the proteome level. We validate AF-CALVADOS using experimental data and consistency between our simulations and the AlphaFold 2 predicted aligned error matrix. We demonstrate the scalability of AF-CALVADOS by performing simulations of 12,483 cytosolic human proteins and make the data freely available; we envisage that large-scale simulation data generated by AF-CALVADOS can be used to benchmark or train machine learning models for flexible, multi-domain proteins. The conformational ensembles can be used to study sequence-dynamics-function relationships at scale, and can shed light on the interplay between folded and disordered regions. We exemplify this by analysing the disordered regions in 1,487 human transcription factors.

Article activity feed