A protein interactome for the last eukaryotic common ancestor illuminates the biochemical basis of modern genetic diseases
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Arcadia Science)
Abstract
All eukaryotes share a single-celled ancestor from ∼1.5–1.8 billion years ago, the Last Eukaryotic Common Ancestor (LECA). Roughly half of gene families found in modern eukaryotes were already present in LECA, forming molecular systems that continue to influence genetic diseases and traits today. To investigate these systems, we compared genes across 156 organisms to define a core set of protein-coding gene families likely present in LECA, with a quarter remaining uncharacterized. Integrating >26,000 mass spectrometry proteomics analyses from 31 species, we inferred higher-order complexes among these ancient proteins. This reconstructed interactome reveals both established and novel assemblies, offering a biochemical snapshot of LECA’s organization. Finally, by exploring these ancient protein interactions, we found new human gene-disease associations for bone density and congenital birth defects, illustrating the value of ancestral protein networks for modern functional genetics.
Article activity feed
-
Figure 4.
In the text, the larger circles are explained as different levels of protein complexing and interaction. It would be helpful to note that here in the figure description. I think it would also be good to mention whether the spatial distribution of circles is meaningful. Do two touching circles share something in common in ways that distant circles do not? Also, the key refers to small circles as complexes, but the description refers to them as individual proteins. This should be clarified. Finally, it would be good to distinguish between uncharacterized genes that have a UniProt function that is known, versus those that are not known, especially since some of these complex which are completely made up of uncharacterized proteins have annotated functions in this figure.
-
31 diverse eukaryotic species
Firstly, I want to commend you for sampling Viridiplantae thoroughly in this reconstruction. These organisms are often skipped or under-sampled, especially when the focus is on human disease in evolutionary biology. I think it would be interesting to include more diversity within these separations. For example, within Viridiplantae, red and green algae which both add important early-diverging context to ancestral reconstruction, and within TSAR, adding brown algae or at least more species outside of the Plasmodium genus.
-
while previous studies reported ∼500 OGs45,46.
When looking at these references, the second reference doesn't seem to focus on LECA but a branch within Amorphea. The other does but it mentions a number closer to 700. I understand it is very likely that these are the best references to approximate OG hit expectations. If that is the case, it might be more informative to show that the gene hits in these studies are in the same OGs are those you identified (or quantify how closely they overlap).
-
Figure 2.
This is a small thing, but I noticed that LECA is marked as being on the tip of the root of your phylogeny. I believe it should actually be at the most ancestral node, at the other end of the root branch where Amorphea and Archeaplastida meet.
-
Finally, by exploring these ancient protein interactions, we found new human gene-disease associations for bone density and congenital birth defects, illustrating the value of ancestral protein networks for modern functional genetics.
This is super cool work leveraging conservation across deep evolutionary time scales. The core thesis is that the PPI that are old (and detectable) enough across these time scales should be fruitful for genotype-phenotype mapping in contexts like human disease, makes overall sense. The fact that you can point to specific examples where this works is also quite remarkable. However, I do wonder if there's potential interesting followup to more directly test the hypothesis of ancient homology -> relevance to disease. The approach outlined here may be just as useful on shorter (but still deep timescales) …
Finally, by exploring these ancient protein interactions, we found new human gene-disease associations for bone density and congenital birth defects, illustrating the value of ancestral protein networks for modern functional genetics.
This is super cool work leveraging conservation across deep evolutionary time scales. The core thesis is that the PPI that are old (and detectable) enough across these time scales should be fruitful for genotype-phenotype mapping in contexts like human disease, makes overall sense. The fact that you can point to specific examples where this works is also quite remarkable. However, I do wonder if there's potential interesting followup to more directly test the hypothesis of ancient homology -> relevance to disease. The approach outlined here may be just as useful on shorter (but still deep timescales) e.g. across vertebrates, where homology/conservation might still be informative, but there might be more ability to pick up signal on these sorts of relationships. It would be extremely interesting to see does e.g. Fig 6A changes as you alter the timescale of divergence in the selected species. Going shallower will likely come with the potential drawback of adding more noise in the analyses, but this is a relationship that is worth explicitly interrogating.
-