Molecular grammars of intrinsically disordered regions that span the human proteome

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Intrinsically disordered regions (IDRs) of proteins are defined by functionally relevant molecular grammars. This refers to IDR-specific non-random amino acid compositions and non-random patterning of distinct pairs of amino acid types. Here, we introduce GIN (Grammars Inferred using NARDINI+) as a resource, which we have used to extract the molecular grammars of all human IDRs and classified them into thirty distinct clusters. Unbiased analyses of IDRome-spanning grammars reveals that specialized IDR grammar features direct biological processes, cellular localization preferences, and molecular functions. IDRs with exceptional grammars, defined as sequences with high-scoring non-random features, are harbored in proteins and complexes that enable spatial and temporal sorting of biochemical activities. Protein complexes within the nucleus recruit specific factors through top-scoring IDRs. These IDRs are frequently disrupted via cancer-associated mutations and fusion oncoproteins. Overall, GIN enables the decoding of sequence-function relationships of IDRs and can be deployed in IDR-specific and IDRome-wide analyses.

Article activity feed