Agl24 is an ancient archaeal homolog of the eukaryotic N-glycan chitobiose synthesis enzymes

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The authors identified the enzyme involved in the transfer of the second GlcNAC residue on the nascent oligosaccharide in protein N-glycosylation of the thermophilic Crenarchaeon Sulfolobus acidocaldarius. Although N-glycosylation is well-known in Euryarchaeota, the enzymes involved in this process, their substrates, and the mechanisms followed to produce the mature glycan are still elusive in Crenarchaeota. This work will impact the community interested in glycoprotein biogenesis and evolution. The experiments reported in this study were well-performed and the results are solid, but text and figure editing is required to enhance the accuracy, readability and strengthen the message of the work.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Protein N-glycosylation is a post-translational modification found in organisms of all domains of life. The crenarchaeal N-glycosylation begins with the synthesis of a lipid-linked chitobiose core structure, identical to that in Eukaryotes, although the enzyme catalyzing this reaction remains unknown. Here, we report the identification of a thermostable archaeal β-1,4- N -acetylglucosaminyltransferase, named a rchaeal gl ycosylation enzyme 24 (Agl24), responsible for the synthesis of the N-glycan chitobiose core. Biochemical characterization confirmed its function as an inverting β-D-GlcNAc-(1→4)-α-D-GlcNAc-diphosphodolichol glycosyltransferase. Substitution of a conserved histidine residue, found also in the eukaryotic and bacterial homologs, demonstrated its functional importance for Agl24. Furthermore, bioinformatics and structural modeling revealed similarities of Agl24 to the eukaryotic Alg14/13 and a distant relation to the bacterial MurG, which are catalyzing the same or a similar reaction, respectively. Phylogenetic analysis of Alg14/13 homologs indicates that they are ancient in Eukaryotes, either as a lateral transfer or inherited through eukaryogenesis.

Article activity feed

  1. Author Response:

    Reviewer #2 (Public Review):

    The paper by Meyer and collaborators first describes the in silico identification of a putative beta 1-4-N-acetylglucosaminyltransferase in the model Crenarchaeon Sulfolobus acidocaldarius. Beta 1-4-N-acetylglucosaminyltransferases are involved in the N-glycosylation pathway for the synthesis of glycoproteins. To detect this enzyme, the authors have used as baits the bacterial enzyme MurG and the eukaryotic enzymes Alg13 and Alg14. These enzymes have no detectable similarities and it was not possible to detect their Sulfolobus homologs by simple BLAST search. However, they detected several putative candidates with very low sequence identity (10-17%) using Delta-BLAST. They selected one of them which was retrieved using the Alg14 protein of Saccharomyces cerevisiae as bait.

    We thank the reviewer for this comment. We tested all identified candidates by deletion approaches. However, Saci1262 was the most promising one, whose function could not be determined by the deletion approach. To confirm the function we developed the described in vitro assay. We restructured the text to make this point more clear.

    They report that the overall topology of this candidate protein is identical to those of MurG and that the N and C terminal part of the protein could correspond to the eukaryotic proteins Alg14 and Alg13, respectively. They give the name Agl24 to this protein (why 24?) and then describe the enzyme as if its identification was already demonstrated. Closely related homologues of this protein are present in most Crenarchaeota, except in Thermoproteales, but absent in other archaea (Fig. S7).

    We agree that this was confusing; we have changed the naming of Saci1262 to Agl24 after the confirmation of its function in the text. Based on the proposal for the naming of N-glycosylation pathway components in Archaea (Eichler et al. ”A proposal for the naming of N-glycosylation pathway components in Archaea”. Glycobiology 23:620–621) the enzyme was named archaeal glycosylation enzyme 24 (Agl24).

    At this stage of the manuscript (lane 153), the identification of Agl24 is only based on the detection of several patches of conserved amino-acids between the different enzymes (Fig.1, B and D) and on a structural model that fits the structure of the homologous enzymes. I suggest that the authors slightly change this part of the manuscript by first describing how they modelled the structure of their candidate enzyme (this is not indicated in the M&M) and how they identified these conserved amino-acids.

    We have added the process used to obtain the structural model to the Methods section.

    They can conclude that the structural and sequence similarities (although low) suggest that they have selected the right candidate and name it (then follow up with their detailed comparison of the different enzymes). An interesting result is the identification of a motif (GGxGGH) conserved between Alg24, MurgG and Alg14. If it's really the first time that this motif is detected, thanks to the identification of Alg24, this is worth to be much more emphasized, especially since the authors demonstrate later on that the histidine is essential.

    We report in this manuscript for the first time the sequence conservation of this motive in bacterial, eukaryotic and archaeal homologs. However, the motif has been described as an extended motif G13GTGGHX2PXLAXAX2LX9G39 in MurG sequences (Crouvoisier et al., 2007). This is now mentioned in the text.

    Lane 37 of the abstract, this sentence is ambiguous, there is no strong similarity between Alg24 and eukaryotic Alg13/14. There is possibly strong structural similarity but it is not obvious that it is higher than with MurG from Figure 2A. Is it possible to quantify these similarities? It could be also wise to compare with the structure of a bacterial EpsF, since, from the phylogenetic analysis of the authors (see below) Alg24 could also exhibit more sequence similarities with EpsF than with MurG.

    We agree that this sentence in the abstract was very ambiguous and it has now been modified. The structural similarity can be quantified and it is represented by the Root Mean Square Deviation (RMSD) values of the C-alpha backbone atoms of our model compared to the published structures (now added to the Table S2) or the Global Model Quality Estimation (GMQE) score. The resulting GMQE score is expressed as a number between 0 and 1, reflecting the expected accuracy of a model built with that alignment and template. GMQE is coverage dependent, and models covering only half of the target sequence is unlikely to get a score above 0.5. The obtained values obtained for Agl24 was given in Table S2, and vary between 0.46-0.56. We also performed the structural prediction with AlphaFold which resulted in a very similar prediction as obtained by Swissmodel (RMSD: 3.80). These findings are now reported in the manuscript.

    No EpsE or EpsF GT structures are published, therefore we could not compare those to Saci1262.

    In my opinion, the next section should not be "Agl24 is essential" but the biochemical characterization of the enzyme which confirms the in silico prediction. The authors have produced and purified a recombinant Alg24 enzyme. They show that the enzyme is membrane bond and has an inverting beta-1,4-N-acetylglucosamine-transferase activity. The section "Alg24 is essential…" could be possibly removed and the result mentioned in the discussion since they are not conclusive concerning the biological role of Alg24 but confirm the previous observation of Zhang and colleagues made by transposon mutagenesis on the Alg24 homologue in Sulfolobus islandicus.

    We thank the reviewer for this important comment. We have added a short text to highlight, that the essential properties indicated that we might had candidate proteins to identify the N-glycosylation enzyme, as our previous studies have indicated that the N-glycosylation is essential in Sulfolobus.

    In comparison with the first part of this paper, the last part, dealing with evolutionary aspects raises many problems. The authors do not seem familiar with evolutionary concepts as indicated by the use of the term "lower eukaryotes" lanes 163, 175 that is not used by evolutionists since it is highly biased (a bit racist!), animals, including ourselves, being " higher" eukaryotes. This weakness is also apparent from the use of the expression "conserved from yeast to human" lane 57. Yeast and Human belong to the same eukaryotic subdivision (Opistokonts) so this conservation does not testify for the presence of an enzyme in the Last Eukaryotic Common Ancestor (LECA). Similarly, the authors often limit their description of "lower eukaryotes" to Leishmania/Trypanosoma or Dictyostelium/Entamoeba. A more extensive survey of the Alg13/14 topology in all eukaryotic major groups would be necessary to conclude for instance that the protein was a monomer or a dimer in LECA.

    While we agree that the term “lower” is not technically correct (and probably slightly speciesist), it’s (even historically) a common colloquialism in the literature. Nonetheless, all the term has been removed or changed to “deep-branching” or more appropriate terminology. The mentions of non-Opisthokont Eukaryotes were limited to these particular Kinetoplastids and Ameobozoa, mainly due to the tidbit of their genes being fused, unlike other Eukaryotes. However, it is evident from both earlier work (Lombard, 2016) and our updated analyses that the distribution of these genes among Eukaryotes is much wider. Specifically, line 57 was referring to (Lehle et al., 2006) and is not linked to our inference of an early presence of these genes in Eykaryotes in general.

    In the revised manuscript, we have searched for homologs in 1611 eukaryotic genomes, and demonstrated a very wide presence of the Alg14-13 proteins among them (Supplementary Data 1). Of these, we used a subset covering the known eukaryotic diversity and from these data there seem to be specific fusion events within the Eukaryotes. Similarly, any fused sequences in Archaea and Bacteria are sparse independent events and we can conclude that at the original acquisition (in LECA or otherwise), the genes were split.

    The authors have performed two single gene phylogenetic analyses of several groups of Agl24 homologues, including either the eukaryotic Alg13 or Alg14, which correspond to the N and C-terminal domains of Alg24. These phylogenies are valid to identify different subgroups of enzymes, but they are not reliable to provide real information about the evolutionary relationships between these different groups and within these groups (for instance, they did not recover the strong clade formed by Thaumarchaeota, Bathyarchaeota and Aigarchaeota which is present in all robust archaeal phylogenies, see for instance Adam et al., PMID: 28777382). This is not surprising considering the small size of the genes and the very low similarities between the different subgroups. Since the two phylogenies are rather congruent, in particular for the identification of the different subgroups, it could be interesting to perform a concatenation (removing Methanopyrus kandleri which is a fast evolving species and disturb the phylogeny with Alg14).

    We agree with the reviewer that the internal topology of the TACK for Alg14-13 does not reflect the reference tree. As pointed out, these sequences are rather short and not very well conserved, which is reflected in our revised phylogenies where almost none of the relationships among TACK clades are strongly supported. Furthermore, as is evident from the various Asgard and the Thermofilales clades, ancient lateral transfer events are not out of the question. However, recovering a strongly supported TACK clade points towards an ancestral presence of that particular Agl24/Alg14-13 homolog in the lineage.

    Concerning a concatenation, we have followed the reviewer’s suggestion and removed the M. kandleri sequence from Alg14. We disagree on the subject of congruence between the two phylogenies; especially for the relationship between Asgards and Eukaryotes, the two trees are incongruent with the respective topologies being strongly supported. Although it is not good practice to proceed with a concatenation in such cases, we did it, if for no other reason to test the effect of mashing the two phylogenetic signals together. The concatenated dataset phylogeny is shown in Supplementary Figures S13 and S14 (collapsed and uncollapsed, respectively). Putting the concatenated dataset together required some planning, since we had to match the subsampled taxa for Eukaryotes and Bacteria in Alg13 and Alg14 and fuse the sequences semi-manually, and in some cases invert the gene order in fused eukaryotic sequences (see Methods section).

    I personally identify 4 subgroups in the two phylogenies that I will discuss in some detail below.

    Group 1: A first group includes a wide variety of archaea belonging to different phyla. Importantly, Euryarchaeota and other archaea (including one sequence of Odinarchaeota) are well separated. This group possibly correspond to descendants of an ancestral enzyme that was present in the Last Archaeal Common Ancestor (LACA). If correct, this indicates the position of LACA in the two trees. Some Crenarchaeota are present in this part. Did they correspond to Thermoproteales or did some Crenarchaeota have both Alg24 and this form?

    We are now able to address this point with our new analysis (mainly using the revised phylogenies), since our updated local database for Archaea contains far more genomes, including all Asgards from (Liu et al., 2021). Moreover, we searched for Alg14-13 homologs against local databases of Eukaryotes and Bacteria (1611 and 25118 genomes respectively), instead of simply relying on the sequences from (Lombard, 2016). Finally, we added the MurG sequences from the respective datasets of (Lombard, 2016) for outgroup rooting of the Alg14 and Alg13 trees, plus we performed outgroup-free rooting with the Minimal Ancestor Deviation method (Tria et al., 2017). The two approaches yield the same root as noted in Figure 7 (that would be without the MurG clade for the MAD root).

    “Group 1”: In the revised rooted phylogenies, this group is sister to the clade containing all other homologs (excluding MurG). It can be divided into two subclades. One consists of mainly methanogenic Euryarchaeota interspersed with other Archaea, mainly DPANN. At least for Methanosarcinales and Methanococcales (and perhaps Methanomicrobiales), the distribution is wide enough and the clades strongly supported, so we can trace the Alg14-13 homolog to the base of each of these lineages but not necessarily to all Euryarchaeota. That is due to the homologs missing from other lineages and disagreements with archaeal reference phylogenies (e.g., Methanomicrobiales and Methanosarcinales should have been together and monophyletic, without the Methanococcales). Any further inferences are problematic, since we have made clear that we mainly stuck with the clades from (Lombard, 2016) and very divergent Alg14-13-like homologs exist throughout Archaea including, for example, in Bathyarchaeota, two in Methanobacteriales.

    The second subclade corresponds to the TACK superphylum with the addition of several Asgard branches therein. The latter correspond to multiple independent ancient lateral transfers from the TACK to different Asgard lineages. There exists a second divergent Thaumarchaeota clade, whose position is inconsistent between the Alg13/EpsF and Alg14/EpsE trees.

    We understand the reviewer’s inclination to label this clade as having emerged at the LACA, however, we believe that our rooted phylogenies suggests that the node corresponding to the LACA is deeper.

    Group 2: A second group corresponds to orthologues of Alg24 include Crenarchaeota (Sulfolobales, Desulfurococcales) and Bathyarchaeota,. Questions: How many Bathyarchaeota? Are they widespread in Bathyarchaeota? Since Bathyarchaeota are also present in group 1 (same questions), these group 2 Bathyarchaeota could correspond to MAG contamination with Crenarchaeota? Or LGT between Crenarchaeota and Bathyarchaeota?

    “Group 2”: The reviewer’s description of this clade is fairly accurate. In the updated phylogenies, it is still composed primarily of Crenarchaeota with one Baldrarchaeota and a strongly supported cluster of Bathyarchaeota sequences (14 in total, mainly from o__B26-1) that indicates a single transfer event (Supplementary Figures 11, 12, 14) from Crenarchaeota to Bahtyarchaeota. A number of the Bathyarchaeota genomes in “Group 2” also contain homologs in “Group 1”. The Thermofilales member B67-G16 also possesses one homolog in each of these groups.

    It is noteworthy that there was another very divergent clade composed of Bathyarchaeota (mainly members of g__PALSA-986 but other as well) in the preliminary phylogenies that were used for dataset cleanup/curation that was omitted from the final datasets. Some of sequences in that clade are also additional homologs to the ones included in our phylogenies. However, further examining the Agl24-like homologs in Bathyarchaeota is beyond the aims of this study.

    The enzymes of group 2 were probably not present in LACA, except of they have evolved more rapidly than the other bona fide archaeal enzymes of group 1 (drastic modification of their function?). More likely, they have been introduced at the base of the Sulfo/Desulfo clade from an unknown source (extinct lineage?)

    We mostly agree with that deduction. However, since most Crenarchaeota only contain a single Agl24-like homolog (unlike some Bathyarchaeota) and we know nothing about the function of “Group 1” enzymes, we would like to avoid further conjecture.

    Group 3: A third group corresponds to EpsF in Archaea and Bacteria Question: How widespread in Bacteria? Were they present in the Last Bacterial Common ancestor? They are sister group to the eukaryotic enzymes. Are they their orthologues? Could it be that the eukaryotic enzymes originated from bacterial EpsF via mitochondria?

    Group 4: A fourth group includes all Eukaryotes (monophyletic) and very few sequences of archaeal MAGs belonging to different phylums, a few Thorarchaeota and one Odinarchaeota, but also several Verstraetearchaeota, one Geothemarchaeota, and one Micrarchaeota (DPANN). The lane 39 in the abstract is thus misleading since the phylogenetic analysis revealed similar sequences not only in two phylums of Asgard but also in Verstraetearchaeota, Geothemarchaeota, and Micrarchaeota! Moreover, these similarities remain very low. This does not fit with the classical situation observed in universal tree of life in which archaeal and eukaryotic proteins always exhibit a high level of similarity.

    “Groups 3 & 4”: We would like to address these two groups together.

    Firstly, bacterial sequences in the revised phylogenies are no longer confined to the EpsE/F-like clade (“Group 3”) but a few sequences are found among the Archaea of “Group 4”. When subsampling the non-MurG clades of our preliminary bacterial trees (Supplementary Data 1) we saw that the EpsE/F-like and Alg14-13-like homologs are quite ancient in some lineages (e.g., Actinobacteria, Cyanobacteria, some Firmicutes subgroups) but when we put sequences from all three domains together, there seem to have been multiple interdomain transfers between Bacteria and Archaea. The archaeal “Group 3” sequences, with the exception of some methanogens and Thermococcales are mainly from DPANN; there even exists a distinct Diapherotrites branch (Supplementary Figures S11, S12). Even though we cannot pinpoint its exact origin, as it is found among archaeal clades from “Groups 2 and 4”, “Group 3” didn’t emerge at the LBCA and is of archaeal origin.

    Other than the addition of a few bacterial sequences, the reviewer’s description of “Group 4” seems accurate. Nevertheless, we are not particularly surprised by the extremely low sequence conservation between Eukaryotes and the various archaeal lineages, seeing how many divergent homologs we have found throughout this analysis. The abstract has been altered to remove the mention to “similar” sequences and all mentions to eukaryogenesis in both the abstract and manuscript have been emended based on the revised phylogenies.

    At the moment we have no way of testing whether Eukaryotes acquired Alg14-13 through mitochondria, in spite of the few bacterial sequences in “Group 4”. All our data point toward an archaeal origin (see below).

    The authors suggest a split (red arrow) at the origin of the Group 3 and 4. However, since the tree is unrooted, one cannot exclude a fusion at the origin of groups 1 and 2?

    More importantly, it is profoundly misleading to conclude from this analysis that eukaryotes emerge from Asgardarchaeota!!!! The position of the lonely archaeal sequences in group 4 suggests either problems of MAG reconstruction (contamination, recombination, mis annotation) or, more interestingly, independent LGT of proto Alg13/14 from proto-eukaryotes to these archaeal lineages. Moreover, the few sequences of Asgard present in group 4 only correspond to two Asgard phylums, while the number of Asgard phylums has skyrocketed in recent years. I did a rapid BLAST search and homologues of the Thorarchaeal sequences of group 4 are absent not only in Heimdall and Loki but also in Hela and Gerda. The authors could contact two Chinese groups who published recently preprint describing several additional Asgard phyla (Liu, Y et al. BioRxiv 2020, Xie R et al., BioRxiv 2021).

    We have to thank the reviewer for this comment and sharing their thoughts with us, that ultimately spurred us to update our local databases and increase the number of Asgard genomes, without which the additional Asgard sequences in “Groups 1 and 2” wouldn’t have been found. On the issue of Archaea in “Group 4”, the monophyletic eukaryotic clade is found within a primarily archaeal clade, surrounded by several other archaeal homologs throughout the phylogeny, and not at root-level. While we did not go in-depth to check all the included Asgard MAGs for problems, the Alg14-13-like homologs are extremely widespread in the various archaeal lineages, covering essentially all Thorarchaeota, Odinarchaeota, and Verstraetearchaeota, which in our opinion would be a far-fetched series of MAG problems. Despite the incongruence on what the sister clade of Eukaryotes is (Odinarchaeota or Thorarchaeota), on the basis of the data we cannot infer that a transfer from Eukaryotes (proto- or otherwise) to Archaea occurred.

    Obviously, the authors have chosen to fit their paper into the mold of the now popular two domains (2D) scenario in which Eukaryotes emerged from Asgardarchaeota. There is presently a debate between proponents of the 2D and 3D (classical Woese) universal tree of life. The authors are obviously strong proponents of the 2D since they don't mention any of the papers that have recently supported the 3D scenario (Da Cunha et al., 2017, 2018). From lane 457 to the end of the paper, all the discussion turned around the 2D model and the Asgard origin of eukaryotes! They possibly consider that the debate has been closed by the paper of Williams and colleagues (Nat Ecol Evol, 2020) who criticized the work of Da Cunha and colleagues. They should notice that Williams et al still obtained a 3D tree with RNA polymerase (supplementary figure 1) except when they use amino-acid recoding a method that reduce the phylogenetic signal (Hernandez and Ryan, BioRxiv, 2020). A 3D tree was again obtained with the RNA polymerase (including those of giant viruses and the three eukaryotic RNA polymerases) by Guglielmini et al., PNAS, 2019. The debate should thus be considered as still open.

    In any case, the phylogenies presented by the authors are not universal tree of life and cannot be used in the 2D versus 3D debate. A proponent of the 3D scenario would said that the Odin sequence present in group 1 corresponds to the real position of Asgardarchaeota, in agreement with the results of Da Cunha et al (2017) who found that Asgardarchaeota are not sister group to eukaryotes but branch deep within archaea.

    We would like to abstain from criticizing any previously published phylogenies on 2D or 3D trees of life. The main point of our paper was to highlight the similarities in N-glycosylation between our crenarchaeal Agl24 and Eukaryotes.

    While we did point out a probable acquisition during eukaryogenesis, as would be implied by a 2D scenario, at no point in the original submission did we try to specifically address the 2D/3D debate, or go out of our way to discredit a 3D tree. That said, we are pointing out in the revised manuscript that the new phylogenies cannot directly support a 3D model. The Eukaryote branch covers their known diversity (barring taxa where we didn’t find any Alg14-13 homologs; Supplementary Data 1). With the exception of a Perkinsela sequence in Alg13, Eukaryotes are monophyletic and seem to emerge from within Archaea in all phylogenies. Moreover, in the rooted trees, they’re nowhere near forming a sister clade to archaeal homologs (all clades or even a subset of them). We hope to make clear in this version of the manuscript that the Asgard group forming the sister clade to Eukaryotes is inconsistent and not Heimdallarchaeota, as would be expected from most published 2D trees. We did not even find any Alg14-13 homologs in Heimdallarchaeota. Furthermore, the internal branching of Eukaryotes (Supplementary Figures S11, S12) does not even recover their major branches. It is unknown whether this is due to transfers or just poor signal, since sequence conservation in eykaryotic Alg14-13 is very low. Thus, even though we retain that acquisition through eukaryogenesis in a 2D tree is a possibility, our results are also compatible with a lateral transfer from Archaea very early in the history of Eukaryotes which is 2D/3D-agnostic.

    Since the enzymes studied here are apparently absent in most Asgards, it is profoundly misleading to label Asgard the group close to eukaryote in the Cover art and to have a highlight claiming that eukaryotic Alg13/14 are closely related to the Asgard homologs, suggesting their acquisition during eukaryogenesis, since the number of these Asgard homologues are very limited.

    Indeed, the cover art could have been misleading; as such, we have removed it. See other parts of this response concerning how we handled the sampling of Asgard genomes and the new results.

    It is also profoundly misleading to conclude in the title that their result "strengthens the hypothesis of an archaeal origin of the eukaryal N-glycosylation". One can only said that archaeal and eukaryotic N-glycosylation pathways are evolutionarily related.

    We disagree with this comment, even if they did not originate during eukaryogenesis, the eukaryotic homologs are clearly archaeal in origin.

    However, in the case of Alg13/Alg14, it seems that these eukaryotic proteins are more closely related to bacterial enzymes (EpsF) than to their archaeal homologues (group 1 and 2)! We would like to know more about the phylogeny and distribution of EpsF in Bacteria and Archaea. According to the authors, they are only present in Euryarchaeota but widespread in Bacteria, suggesting a LGT from Bacteria to Archaea. Was this enzyme present in the Last bacterial common ancestor? In summary, the authors conclusions and formulations on the evolutionary part of their paper, especially in the title, the summary, the discussion and the Cover Art are misleading and should be corrected.

    We have now address the issue of EpsEF and their relationship with Alg14-13. To recap the contents of the manuscript and this response, EpsEF and Alg14/13-related sequences are very widespread in Bacteria but do not date to the LBCA. The clade EpsEF+Alg14-13 (“Groups 3& 4”) clade is sister to Agl24 and archaeal in origin although there exist multiple interdomain transfers between Archaea and Bacteria. EpsEF sequences are found in Euryarchaeota but also in DPANN. There exists a single gene split event at the base of the EpsEF-Alg14/13 clade followed by multiple independent gene fusion events in all three domains.

    Reviewer #3 (Public Review):

    Meyer, Benjamin et al. identified the enzyme involved in the transfer of the second GlcNAC residue on the nascent oligosaccharide in protein N-glycosylation of the thermophilic Crenarchaeon Sulfolobus acidocaldarius. Although N-glycosylation is well-known in Euryarchaeota, the enzymes involved in this process, their substrates, and the mechanisms followed to produce the mature glycan are still elusive in Crenarchaeota, a phylum belonging to TACK archaeal superphylum, which contains also Thaumarchaeota, Aigarchaeota, and Korarchaeota. The authors, by screening the data banks with the sequences of the bacterial MurG and yeast ALg13/Alg14, which catalyze the transfer of GlcNAC in N-glycosylation, identified a gene, named saci1262 and alg24 showing very low identity. The authors characterized in deep the product of this gene with a very complete approach. Firstly, the authors could demonstrate by molecular modelling that Alg24 enzyme shows a 3D structure similar to those of MurG and Alg13/Alg14, and catalytic residues similar to the latter enzyme. The functional characterization was very complete, showing that alg24 is essential in vivo, and that the recombinant Alg24 specifically uses UDP-GlcNAC and lipid-GlcNAc as donor and acceptors substrates, respectively. In addition, the enzyme was thermophilic, did not require metals for catalysis, and followed an 'inverting' reaction mechanism in which the anomeric configuration of the product is the opposite to that of the substrate. Experiments of site-directed mutagenesis demonstrated that His14 is essential for catalysis as predicted by sequence multialignments and inspection of the 3D models, while the role of Glu114, also invariant, remained obscure. Then, the phylogenetic analysis of Alg13/Alg14 on TACK archaeal superphylum, showed that Alg24 are widespread among Archaea, suggesting that N-glycosylation in Eukaryotes was inherited by an archaeal ancient ancestor. This observation fostered the hypothesis that the first eukaryotic cell originated from Asgard superphylum.

    Strengths:

    The main question of the work, which is the enzyme involved in the first crucial step of protein glycosylation in Archaea? is of general interest in glycobiology. Although this process, in the past believed a peculiarity of Eukaryotes, has been well studied in Euryarchaeota, it is almost unknown in TACK superphylum that, being considered the closest to the Last Eukaryotic Common Ancestor, is a very interesting matter of study. The work shows several strengths:

    1. The authors unequivocally demonstrated that Alg24 is the enzyme catalyzing the transfer of the second GlcNAc unit on the nascent oligosaccharide, thereby completing the puzzle of the first step of N-glycosylation, for which only AlgH enzyme was known so far.
    1. The approach used to identify Alg24, the choice of the model system, the characterization of the enzyme are absolutely excellent and set a new standard to study N-glycosylation in Archaea.
    1. The identification and characterization of a novel Glycosyl Transferase is of great importance in glycobiology. GTs are elusive enzyme, difficult to purify, due to their instability and association to membranes, and to characterize because of their extreme specificity for donor and acceptor substrates. In addition, GT enzymatic assays use very expensive substrates and very laborious procedures. For this reason, characterized GT are by far less common than, for instance, glycoside hydrolases. This study is a milestone for glycobiology. GTs from thermophilic microorganisms could be interesting subject studies in general. Thermostable GT could be more easy to purify and characterize if compared to their mesophilic counterparts.
    1. The knock-out in vivo of alg24 gene, was possible because S. acidocaldarius model system is one of the few Crenarchaea for which reliable molecular genetics tools can be used. These experiments, confirmed that N-glycosylation is essential in Crenarchaeota as previously shown for AlgH and AlgB.

    Weaknesses:

    There are not many weaknesses in this work.

    1. How the characterization of Alg24 is directly connected to the evidence that N-glycosylation in Eukaryotes was inherited from an ancestral archaeal cell should be better explained.

    We thank the reviewer for this suggestion. We believe that in the revised manuscript and phylogenies we clarify how the Alg14-13 eukaryotic homologs are of archaeal origin (due to being nested within an otherwise archaeal clade) and the possible scenarios for their origin (eukaryogenesis or transfer).

    1. The novelty of the presence of Alg13/14 and Alg24 homologues in TACK superphylum shown in this paper should be commented in comparison with the available literature.

    We have added a new paragraph and references in the discussion part to highlight the novelty of Agl24, adding also the information that Agl24 will be assigned to a new GT family (see below).

    1. The Cover Art should be revised. The 'take home message' is not clear and the phylogenetic interdependencies of the different superphyla are a bit confusing.

    We have removed the cover art, as it might create more confusion than being helpful.

  2. Reviewer #3 (Public Review):

    Meyer, Benjamin et al. identified the enzyme involved in the transfer of the second GlcNAC residue on the nascent oligosaccharide in protein N-glycosylation of the thermophilic Crenarchaeon Sulfolobus acidocaldarius. Although N-glycosylation is well-known in Euryarchaeota, the enzymes involved in this process, their substrates, and the mechanisms followed to produce the mature glycan are still elusive in Crenarchaeota, a phylum belonging to TACK archaeal superphylum, which contains also Thaumarchaeota, Aigarchaeota, and Korarchaeota. The authors, by screening the data banks with the sequences of the bacterial MurG and yeast ALg13/Alg14, which catalyze the transfer of GlcNAC in N-glycosylation, identified a gene, named saci1262 and alg24 showing very low identity. The authors characterized in deep the product of this gene with a very complete approach. Firstly, the authors could demonstrate by molecular modelling that Alg24 enzyme shows a 3D structure similar to those of MurG and Alg13/Alg14, and catalytic residues similar to the latter enzyme. The functional characterization was very complete, showing that alg24 is essential in vivo, and that the recombinant Alg24 specifically uses UDP-GlcNAC and lipid-GlcNAc as donor and acceptors substrates, respectively. In addition, the enzyme was thermophilic, did not require metals for catalysis, and followed an 'inverting' reaction mechanism in which the anomeric configuration of the product is the opposite to that of the substrate. Experiments of site-directed mutagenesis demonstrated that His14 is essential for catalysis as predicted by sequence multialignments and inspection of the 3D models, while the role of Glu114, also invariant, remained obscure. Then, the phylogenetic analysis of Alg13/Alg14 on TACK archaeal superphylum, showed that Alg24 are widespread among Archaea, suggesting that N-glycosylation in Eukaryotes was inherited by an archaeal ancient ancestor. This observation fostered the hypothesis that the first eukaryotic cell originated from Asgard superphylum.

    Strengths:

    The main question of the work, which is the enzyme involved in the first crucial step of protein glycosylation in Archaea? is of general interest in glycobiology. Although this process, in the past believed a peculiarity of Eukaryotes, has been well studied in Euryarchaeota, it is almost unknown in TACK superphylum that, being considered the closest to the Last Eukaryotic Common Ancestor, is a very interesting matter of study. The work shows several strengths:

    1. The authors unequivocally demonstrated that Alg24 is the enzyme catalyzing the transfer of the second GlcNAc unit on the nascent oligosaccharide, thereby completing the puzzle of the first step of N-glycosylation, for which only AlgH enzyme was known so far.

    2. The approach used to identify Alg24, the choice of the model system, the characterization of the enzyme are absolutely excellent and set a new standard to study N-glycosylation in Archaea.

    3. The identification and characterization of a novel Glycosyl Transferase is of great importance in glycobiology. GTs are elusive enzyme, difficult to purify, due to their instability and association to membranes, and to characterize because of their extreme specificity for donor and acceptor substrates. In addition, GT enzymatic assays use very expensive substrates and very laborious procedures. For this reason, characterized GT are by far less common than, for instance, glycoside hydrolases. This study is a milestone for glycobiology. GTs from thermophilic microorganisms could be interesting subject studies in general. Thermostable GT could be more easy to purify and characterize if compared to their mesophilic counterparts.

    4. The knock-out in vivo of alg24 gene, was possible because S. acidocaldarius model system is one of the few Crenarchaea for which reliable molecular genetics tools can be used. These experiments, confirmed that N-glycosylation is essential in Crenarchaeota as previously shown for AlgH and AlgB.

    Weaknesses:

    There are not many weaknesses in this work.

    1. How the characterization of Alg24 is directly connected to the evidence that N-glycosylation in Eukaryotes was inherited from an ancestral archaeal cell should be better explained.

    2. The novelty of the presence of Alg13/14 and Alg24 homologues in TACK superphylum shown in this paper should be commented in comparison with the available literature.

    3. The Cover Art should be revised. The 'take home message' is not clear and the phylogenetic interdependencies of the different superphyla are a bit confusing.

  3. Reviewer #2 (Public Review):

    The paper by Meyer and collaborators first describes the in silico identification of a putative beta 1-4-N-acetylglucosaminyltransferase in the model Crenarchaeon Sulfolobus acidocaldarius. Beta 1-4-N-acetylglucosaminyltransferases are involved in the N-glycosylation pathway for the synthesis of glycoproteins. To detect this enzyme, the authors have used as baits the bacterial enzyme MurG and the eukaryotic enzymes Alg13 and Alg14. These enzymes have no detectable similarities and it was not possible to detect their Sulfolobus homologs by simple BLAST search. However, they detected several putative candidates with very low sequence identity (10-17%) using Delta-BLAST. They selected one of them which was retrieved using the Alg14 protein of Saccharomyces cerevisiae as bait. They report that the overall topology of this candidate protein is identical to those of MurG and that the N and C terminal part of the protein could correspond to the eukaryotic proteins Alg14 and Alg13, respectively. They give the name Agl24 to this protein (why 24?) and then describe the enzyme as if its identification was already demonstrated. Closely related homologues of this protein are present in most Crenarchaeota, except in Thermoproteales, but absent in other archaea (Fig. S7).

    At this stage of the manuscript (lane 153), the identification of Agl24 is only based on the detection of several patches of conserved amino-acids between the different enzymes (Fig.1, B and D) and on a structural model that fits the structure of the homologous enzymes. I suggest that the authors slightly change this part of the manuscript by first describing how they modelled the structure of their candidate enzyme (this is not indicated in the M&M) and how they identified these conserved amino-acids. They can conclude that the structural and sequence similarities (although low) suggest that they have selected the right candidate and name it (then follow up with their detailed comparison of the different enzymes). An interesting result is the identification of a motif (GGxGGH) conserved between Alg24, MurgG and Alg14. If it's really the first time that this motif is detected, thanks to the identification of Alg24, this is worth to be much more emphasized, especially since the authors demonstrate later on that the histidine is essential.

    Lane 37 of the abstract, this sentence is ambiguous, there is no strong similarity between Alg24 and eukaryotic Alg13/14. There is possibly strong structural similarity but it is not obvious that it is higher than with MurG from Figure 2A. Is it possible to quantify these similarities? It could be also wise to compare with the structure of a bacterial EpsF, since, from the phylogenetic analysis of the authors (see below) Alg24 could also exhibit more sequence similarities with EpsF than with MurG.

    In my opinion, the next section should not be "Agl24 is essential" but the biochemical characterization of the enzyme which confirms the in silico prediction. The authors have produced and purified a recombinant Alg24 enzyme. They show that the enzyme is membrane bond and has an inverting beta-1,4-N-acetylglucosamine-transferase activity. The section "Alg24 is essential..." could be possibly removed and the result mentioned in the discussion since they are not conclusive concerning the biological role of Alg24 but confirm the previous observation of Zhang and colleagues made by transposon mutagenesis on the Alg24 homologue in Sulfolobus islandicus.

    In comparison with the first part of this paper, the last part, dealing with evolutionary aspects raises many problems. The authors do not seem familiar with evolutionary concepts as indicated by the use of the term "lower eukaryotes" lanes 163, 175 that is not used by evolutionists since it is highly biased (a bit racist!), animals, including ourselves, being " higher" eukaryotes. This weakness is also apparent from the use of the expression "conserved from yeast to human" lane 57. Yeast and Human belong to the same eukaryotic subdivision (Opistokonts) so this conservation does not testify for the presence of an enzyme in the Last Eukaryotic Common Ancestor (LECA). Similarly, the authors often limit their description of "lower eukaryotes" to Leishmania/Trypanosoma or Dictyostelium/Entamoeba. A more extensive survey of the Alg13/14 topology in all eukaryotic major groups would be necessary to conclude for instance that the protein was a monomer or a dimer in LECA.

    The authors have performed two single gene phylogenetic analyses of several groups of Agl24 homologues, including either the eukaryotic Alg13 or Alg14, which correspond to the N and C-terminal domains of Alg24. These phylogenies are valid to identify different subgroups of enzymes, but they are not reliable to provide real information about the evolutionary relationships between these different groups and within these groups (for instance, they did not recover the strong clade formed by Thaumarchaeota, Bathyarchaeota and Aigarchaeota which is present in all robust archaeal phylogenies, see for instance Adam et al., PMID: 28777382). This is not surprising considering the small size of the genes and the very low similarities between the different subgroups. Since the two phylogenies are rather congruent, in particular for the identification of the different subgroups, it could be interesting to perform a concatenation (removing Methanopyrus kandleri which is a fast evolving species and disturb the phylogeny with Alg14).

    I personally identify 4 subgroups in the two phylogenies that I will discuss in some detail below.

    Group 1: A first group includes a wide variety of archaea belonging to different phyla. Importantly, Euryarchaeota and other archaea (including one sequence of Odinarchaeota) are well separated. This group possibly correspond to descendants of an ancestral enzyme that was present in the Last Archaeal Common Ancestor (LACA). If correct, this indicates the position of LACA in the two trees. Some Crenarchaeota are present in this part. Did they correspond to Thermoproteales or did some Crenarchaeota have both Alg24 and this form?

    Group 2: A second group corresponds to orthologues of Alg24 include Crenarchaeota (Sulfolobales, Desulfurococcales) and Bathyarchaeota,. Questions: How many Bathyarchaeota? Are they widespread in Bathyarchaeota? Since Bathyarchaeota are also present in group 1 (same questions), these group 2 Bathyarchaeota could correspond to MAG contamination with Crenarchaeota? Or LGT between Crenarchaeota and Bathyarchaeota?

    The enzymes of group 2 were probably not present in LACA, except of they have evolved more rapidly than the other bona fide archaeal enzymes of group 1 (drastic modification of their function?). More likely, they have been introduced at the base of the Sulfo/Desulfo clade from an unknown source (extinct lineage?)

    Group 3: A third group corresponds to EpsF in Archaea and Bacteria Question: How widespread in Bacteria? Were they present in the Last Bacterial Common ancestor? They are sister group to the eukaryotic enzymes. Are they their orthologues? Could it be that the eukaryotic enzymes originated from bacterial EpsF via mitochondria?

    Group 4: A fourth group includes all Eukaryotes (monophyletic) and very few sequences of archaeal MAGs belonging to different phylums, a few Thorarchaeota and one Odinarchaeota, but also several Verstraetearchaeota, one Geothemarchaeota, and one Micrarchaeota (DPANN). The lane 39 in the abstract is thus misleading since the phylogenetic analysis revealed similar sequences not only in two phylums of Asgard but also in Verstraetearchaeota, Geothemarchaeota, and Micrarchaeota! Moreover, these similarities remain very low. This does not fit with the classical situation observed in universal tree of life in which archaeal and eukaryotic proteins always exhibit a high level of similarity.

    The authors suggest a split (red arrow) at the origin of the Group 3 and 4. However, since the tree is unrooted, one cannot exclude a fusion at the origin of groups 1 and 2?

    More importantly, it is profoundly misleading to conclude from this analysis that eukaryotes emerge from Asgardarchaeota!!!! The position of the lonely archaeal sequences in group 4 suggests either problems of MAG reconstruction (contamination, recombination, mis annotation) or, more interestingly, independent LGT of proto Alg13/14 from proto-eukaryotes to these archaeal lineages. Moreover, the few sequences of Asgard present in group 4 only correspond to two Asgard phylums, while the number of Asgard phylums has skyrocketed in recent years. I did a rapid BLAST search and homologues of the Thorarchaeal sequences of group 4 are absent not only in Heimdall and Loki but also in Hela and Gerda. The authors could contact two Chinese groups who published recently preprint describing several additional Asgard phyla (Liu, Y et al. BioRxiv 2020, Xie R et al., BioRxiv 2021).

    Obviously, the authors have chosen to fit their paper into the mold of the now popular two domains (2D) scenario in which Eukaryotes emerged from Asgardarchaeota. There is presently a debate between proponents of the 2D and 3D (classical Woese) universal tree of life. The authors are obviously strong proponents of the 2D since they don't mention any of the papers that have recently supported the 3D scenario (Da Cunha et al., 2017, 2018). From lane 457 to the end of the paper, all the discussion turned around the 2D model and the Asgard origin of eukaryotes! They possibly consider that the debate has been closed by the paper of Williams and colleagues (Nat Ecol Evol, 2020) who criticized the work of Da Cunha and colleagues. They should notice that Williams et al still obtained a 3D tree with RNA polymerase (supplementary figure 1) except when they use amino-acid recoding a method that reduce the phylogenetic signal (Hernandez and Ryan, BioRxiv, 2020). A 3D tree was again obtained with the RNA polymerase (including those of giant viruses and the three eukaryotic RNA polymerases) by Guglielmini et al., PNAS, 2019. The debate should thus be considered as still open.

    In any case, the phylogenies presented by the authors are not universal tree of life and cannot be used in the 2D versus 3D debate. A proponent of the 3D scenario would said that the Odin sequence present in group 1 corresponds to the real position of Asgardarchaeota, in agreement with the results of Da Cunha et al (2017) who found that Asgardarchaeota are not sister group to eukaryotes but branch deep within archaea.

    Since the enzymes studied here are apparently absent in most Asgards, it is profoundly misleading to label Asgard the group close to eukaryote in the Cover art and to have a highlight claiming that eukaryotic Alg13/14 are closely related to the Asgard homologs, suggesting their acquisition during eukaryogenesis, since the number of these Asgard homologues are very limited

    It is also profoundly misleading to conclude in the title that their result "strengthens the hypothesis of an archaeal origin of the eukaryal N-glycosylation". One can only said that archaeal and eukaryotic N-glycosylation pathways are evolutionarily related.

    However, in the case of Alg13/Alg14, it seems that these eukaryotic proteins are more closely related to bacterial enzymes (EpsF) than to their archaeal homologues (group 1 and 2)! We would like to know more about the phylogeny and distribution of EpsF in Bacteria and Archaea. According to the authors, they are only present in Euryarchaeota but widespread in Bacteria, suggesting a LGT from Bacteria to Archaea. Was this enzyme present in the Last bacterial common ancestor? In summary, the authors conclusions and formulations on the evolutionary part of their paper, especially in the title, the summary, the discussion and the Cover Art are misleading and should be corrected.

  4. Reviewer #1 (Public Review):

    N-glycosylation, the covalent attachment of glycans to selected asparagine residues of target proteins, is a post-translational protein modification that occurs across evolution. However, in contrast to the relatively well described eukaryotic and bacterial N-glycosylation processes, the parallel event in Archaea is less well defined. In this manuscript, Meyer and colleagues report on Agl24, a new component of the N-glycosylation pathway of Sulfolobus acidocaldarius, an archaeal species that lives in hot and acidic conditions. Reminiscent of what is seen in eukaryotic N-linked glycans, the N-linked glycan of S. acidocaldarius also relies on a chitobiose (two N-acetylglucosamines) core. Agl24 was shown to add the second such sugar. This activity is similar to that catalyzed by eukaryotic Alg13/14 and bacterial MurG. Accordingly, similarities among these enzymes were first used to predict Agl24 function. Such activity was subsequently confirmed using various experimental approaches. Finally, the evolutionary implications of Agl24 contributing to the assembly of a chitobiose moiety much like Alg13/14 in eukaryotes were considered. Phylogenetic analyses lend support to the idea that eukaryotic N-glycosylation originated from a sub-set of Archaea.

  5. Evaluation Summary:

    The authors identified the enzyme involved in the transfer of the second GlcNAC residue on the nascent oligosaccharide in protein N-glycosylation of the thermophilic Crenarchaeon Sulfolobus acidocaldarius. Although N-glycosylation is well-known in Euryarchaeota, the enzymes involved in this process, their substrates, and the mechanisms followed to produce the mature glycan are still elusive in Crenarchaeota. This work will impact the community interested in glycoprotein biogenesis and evolution. The experiments reported in this study were well-performed and the results are solid, but text and figure editing is required to enhance the accuracy, readability and strengthen the message of the work.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)