Microsatellite expansions hidden within the human dark genome are translated in novel and toxic proteins causing muscle and neurodegenerative diseases
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The vast majority of the human genome is non-coding with one-half composed of repeated DNA elements, including microsatellites that are short repeated sequences of 1 to 6 nucleotides. Expansion of a subset of these microsatellites is the leading cause of over 60 neurological diseases. However, most of these short tandem repeat expansions are located in sequences annotated as non-coding, thus questioning how these mutations are pathogenic. Here, we found that GGC repeat expansions causing various neurological diseases, including oculopharyngodistal myopathy with or without leukoencephalopathy (OPDM/OPML) and neuronal intranuclear inclusion disease (NIID), while embedded in sequences considered as non-coding, are in reality located within small and previously unrecognized ORFs, resulting in their translation into novel and diverse polyglycine-containing proteins. Antibodies developed against these proteins stain the p62-positive inclusions typical of these diseases. Importantly, the sole expression of these polyglycine-containing proteins recapitulates key features of OPDM/OPML/NIID, namely the formation of p62-positive protein aggregates and locomotor and skeletal muscle alterations associated with neurodegeneration in cell, fly and mouse models. Moreover, these polyglycine proteins show unexpected variations in their interactants, half-life, aggregation and toxicity. These results stress a key role of the specific ORF sequences hosting the GGC repeats to modulate the aggregation and toxic properties of their central polyglycine core. Finally, we identified a pharmacological compound targeting expression of these polyglycine proteins, raising hope to develop a common therapy for these neuromuscular and neurodegenerative diseases. Overall, these results uncover a common and unified pathogenic mechanism for diverse neurological diseases where expansions of GGC repeats are translated in novel and toxic polyglycine-containing proteins driving formation of aggregates, as well as neuronal and muscle cell dysfunctions. Moreover, this work highlights the complexity and richness of the human “dark” proteome and the importance of mutations in yet unrecognized small ORFs resulting in expression of novel and pathogenic proteins in human pathologies.