Pre-Cambrian origin of envelope-carrying retrotransposons in metazoans
Curation statements for this article:-
Curated by eLife
eLife Assessment
This important study provides convincing evidence that envelope-carrying Ty3/gypsy retrotransposons (errantiviruses) are ancient, widespread, and actively expanding across nearly all major animal phyla. Using comprehensive phylogenetic and AlphaFold2-based structural analyses, the authors show that these elements independently acquired membrane fusion proteins early in metazoan evolution, likely predating the bilaterian-non-bilaterian split. While some aspects could be more clearly contextualized and explained better, the work offers insights into the deep evolutionary roots of retroelement-envelope associations and the origins of retroviruses.
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (eLife)
Abstract
Retrotransposons or endogenous retroviruses (ERVs) essentially carry open reading frames of gag and pol, which are utilized to selfishly replicate themselves in the host germline genome. One rare example of ERVs that additionally carry envelope genes, are Ty3/gypsy errantiviruses in Drosophila. Though they are structurally analogous to retroviruses, their evolutionary relationship as well as the origin of envelope genes remain elusive. We systematically searched for intact envelope-containing ERVs that are homologous to Ty3/gypsy in invertebrate metazoan genomes and found that they are widespread across taxa including ancient animals such as cnidarians, ctenophores and tunicates. Most elements are multiplied in their respective genomes, implying recent germline expansion. envelope genes are classified into those that resemble glycoprotein F from paramyxoviruses and glycoprotein B from herpesviruses, and both types are equally abundant and widespread. Phylogenetic and structural analyses revealed that envelope genes have largely diverged with pol genes as well as with the host organisms throughout their evolutionary history and recombined infrequently, suggesting that the envelope acquisition to ERVs is ancient and likely dates to before the split of bilaterian and non-bilaterian animals in Pre-Cambrian era.
Article activity feed
-
eLife Assessment
This important study provides convincing evidence that envelope-carrying Ty3/gypsy retrotransposons (errantiviruses) are ancient, widespread, and actively expanding across nearly all major animal phyla. Using comprehensive phylogenetic and AlphaFold2-based structural analyses, the authors show that these elements independently acquired membrane fusion proteins early in metazoan evolution, likely predating the bilaterian-non-bilaterian split. While some aspects could be more clearly contextualized and explained better, the work offers insights into the deep evolutionary roots of retroelement-envelope associations and the origins of retroviruses.
-
Reviewer #1 (Public review):
Summary:
This manuscript provides a comprehensive systematic analysis of envelope-containing Ty3/gypsy retrotransposons (errantiviruses) across metazoan genomes, including both invertebrates and ancient animal lineages. Using iterative tBLASTn mining of over 1,900 genomes, the authors catalog 1,512 intact retrotransposons with uninterrupted gag, pol, and env open reading frames. They show that these elements are widespread-present in most metazoan phyla, including cnidarians, ctenophores, and tunicates-with active proliferation indicated by their multicopy status. Phylogenetic analyses distinguish "ancient" and "insect" errantivirus clades, while structural characterization (including AlphaFold2 modeling) reveals two major env types: paramyxovirus F-like and herpesvirus gB-like proteins. Although bot …
Reviewer #1 (Public review):
Summary:
This manuscript provides a comprehensive systematic analysis of envelope-containing Ty3/gypsy retrotransposons (errantiviruses) across metazoan genomes, including both invertebrates and ancient animal lineages. Using iterative tBLASTn mining of over 1,900 genomes, the authors catalog 1,512 intact retrotransposons with uninterrupted gag, pol, and env open reading frames. They show that these elements are widespread-present in most metazoan phyla, including cnidarians, ctenophores, and tunicates-with active proliferation indicated by their multicopy status. Phylogenetic analyses distinguish "ancient" and "insect" errantivirus clades, while structural characterization (including AlphaFold2 modeling) reveals two major env types: paramyxovirus F-like and herpesvirus gB-like proteins. Although bot envelope types were identified in previous analyses two decades ago, the evolutionary provenance of these envelope genes was almost rudimentary and anecdotal (I can say this because I authored one of these studies). The results in the present study support an ancient origin for env acquisition in metazoan Ty3/gypsy elements, with subsequent vertical inheritance and limited recombination between env and pol domains. The paper also proposes an expanded definition of 'errantivirus' for env-carrying Ty3/gypsy elements outside Drosophila.
Strengths:
(1) Comprehensive Genomic Survey:
The breadth of the genome search across non-model metazoan phyla yields an impressive dataset covering evolutionary breadth, with clear documentation of search iterations and validation criteria for intact elements.(2) Robust Phylogenetic Inference:
The use of maximum likelihood trees on both pol and env domains, with thorough congruence analysis, convincingly separates ancient from lineage-specific elements and demonstrates co-evolution of env and pol within clades.(3) Structural Insights:
AlphaFold2-based predictions provide high-confidence structural evidence that both env types have retained fusion-competent architectures, supporting the hypothesis of preserved functional potential.(4) Novelty and Scope:
The study challenges previous assumptions of insect-centric or recent env acquisition and makes a compelling case for a Pre-Cambrian origin, significantly advancing our understanding of animal retroelement diversity and evolution. THIS IS A MAJOR ADVANCE.(5) Data Transparency:
I appreciate that all data, code, and predicted structures are made openly available, facilitating reproducibility and future comparative analyses.Major Weaknesses
(1) Functional Evidence Gaps:
The work rests largely on sequence and structure prediction. No direct expression or experimental validation of envelope gene function or infectivity outside Drosophila is attempted, which would be valuable to corroborate the inferred roles of these glycoproteins in non-insect lineages. At least for some of these species, there are RNA-seq datasets that could be leveraged.(2) Horizontal Transfer vs. Loss Hypotheses:
The discussion argues primarily for vertical inheritance, but the somewhat sporadic phylogenetic distributions and long-branch effects suggest that loss and possibly rare horizontal events may contribute more than acknowledged. Explicit quantitative tests for horizontal transfer, or reconciliation analyses, would strengthen this conclusion. It's also worth pointing out that, unlike retrotransposons that can be found in genomes, any potential related viral envelopes must, by definition, have a spottier distribution due to sampling. I don't think this challenges any of the conclusions, but it must be acknowledged as something that could affect the strength of this conclusion(3) Limited Taxon Sampling for Certain Phyla:
Despite the impressive breadth, some ancient lineages (e.g., Porifera, Echinodermata) are negative, but the manuscript does not fully explore whether this reflects real biological absence, assembly quality, or insufficient sampling. A more systematic treatment of negative findings would clarify claims of ubiquity. However, I also believe this falls beyond the scope of this study.(4) Mechanistic Ambiguity:
The proposed model that env-containing elements exploit ovarian somatic niches is plausible but extrapolated from Drosophila data; for most taxa, actual tissue specificity, lifecycle, or host interaction mechanisms remain speculative and, to me, a bit unreasonable.Minor Weaknesses:
(1) Terminology and Nomenclature:
The paper introduces and then generalizes the term "errantivirus" to non-insect elements. While this is logical, it may confuse readers familiar with the established, Drosophila-centric definition if not more explicitly clarified throughout. I also worry about changes being made without any input from the ICTV nomenclature committee, which just went through a thorough reclassification. Nevertheless, change is expected, and calling them all errantiviruses is entirely reasonable.(2) Figures and Supplementary Data Navigation:
Some key phylogenies and domain alignments are found only in supplementary figures, occasionally hindering readability for non-expert audiences. Selected main-text inclusion of representative trees would benefit accessibility.(3) ORF Integrity Thresholds:
The cutoff choices for defining "intact" elements (e.g., numbers/placement of stop codons, length ranges) are reasonable but only lightly justified. More rationale or sensitivity analysis would improve confidence in the inclusion criteria. For example, how did changing these criteria change the number of intact elements?(4) Minor Typos/Formatting:
The paper contains sporadic typographical errors and formatting glitches (e.g., misaligned figure labels, unrendered symbols) that should be addressed. -
Reviewer #2 (Public review):
Summary:
The authors first surveyed metazoan genomes to identify homologs of Drosophila errantiviruses and classified them into two groups, "insect" and "ancient" elements, supporting the hypothesis of an early evolutionary origin for these retrotransposons. They subsequently identified two distinct types of envelope proteins, one resembling the glycoprotein F of paramyxoviruses and the other akin to the glycoprotein B of herpesviruses. Despite differences in their primary amino acid sequences, these proteins display notable structural similarity in their predicted domain architectures. The congruence between the phylogenies of the envelope and pol genes further supports the ancient origin of the envelope genes, challenging earlier hypotheses that proposed recent recombination events with baculoviruses. …
Reviewer #2 (Public review):
Summary:
The authors first surveyed metazoan genomes to identify homologs of Drosophila errantiviruses and classified them into two groups, "insect" and "ancient" elements, supporting the hypothesis of an early evolutionary origin for these retrotransposons. They subsequently identified two distinct types of envelope proteins, one resembling the glycoprotein F of paramyxoviruses and the other akin to the glycoprotein B of herpesviruses. Despite differences in their primary amino acid sequences, these proteins display notable structural similarity in their predicted domain architectures. The congruence between the phylogenies of the envelope and pol genes further supports the ancient origin of the envelope genes, challenging earlier hypotheses that proposed recent recombination events with baculoviruses. Additional analysis of the Pol "bridge region" corroborated the divergence among these elements, consistent with a pattern of limited cross-species recombination. Finally, by comparing these elements with non-envelope-containing Gypsy retrotransposons, the authors concluded that errantiviruses originated from multiple elements independently.
Strengths:
The conclusions of this study are based on a comprehensive collection of errantiviruses identified across a wide range of metazoan genomes. These findings are further supported by multiple lines of evidence, including phylogenetic congruence and the diverse evolutionary origins of envelope genes. AlphaFold2-assisted protein domain structure analyses also provided key insights into the characterization of these elements. Together, these results present a compelling case that errantiviruses arose independently through multiple evolutionary events, extending well beyond previous hypotheses.
Weaknesses:
It would be beneficial to emphasize in the Abstract the potential impact of this work by more clearly articulating the current knowledge gap in the field. While the second paragraph of the Introduction briefly touches on this point, highlighting the broader significance in the Abstract would better capture readers' interest. Additionally, some methodological choices would benefit from clearer justification and explanation. For instance, in Figure 6, the selection of the bridge region/RNase H domain is not explicitly explained, leaving the rationale for its choice unclear. As a minor point, some figure labels and texts are too small and difficult to read, and improving their legibility would enhance overall clarity.
-
Reviewer #3 (Public review):
Summary and Significance:
In this work, Cary and Hayashi address the important question of when, in evolution, certain mobile genetic elements (Ty3/gypsy-like non-LTR retrotransposons) associated with certain membrane fusion proteins (viral glycoprotein F or B-like proteins), which could allow these mobile genetic elements to be transferred between individual cells of a given host. It is debated in the literature whether the acquisition of membrane fusion proteins by non-LTR retrotransposons is a rather recent phenomenon that separately occurred in the ancestors of certain host species or whether the association with membrane fusion proteins is a much more ancient one, pre-dating the Cambrian explosion. Obviously, this question also touches upon the origin of the retroviruses, which can spread between …
Reviewer #3 (Public review):
Summary and Significance:
In this work, Cary and Hayashi address the important question of when, in evolution, certain mobile genetic elements (Ty3/gypsy-like non-LTR retrotransposons) associated with certain membrane fusion proteins (viral glycoprotein F or B-like proteins), which could allow these mobile genetic elements to be transferred between individual cells of a given host. It is debated in the literature whether the acquisition of membrane fusion proteins by non-LTR retrotransposons is a rather recent phenomenon that separately occurred in the ancestors of certain host species or whether the association with membrane fusion proteins is a much more ancient one, pre-dating the Cambrian explosion. Obviously, this question also touches upon the origin of the retroviruses, which can spread between individuals of a given host but seem restricted to vertebrates. Based on convincing data, Cary and Hayashi argue that an ancient association of non-LTR retrotransposons with membrane fusion proteins is most probable.
Strengths:
The authors take the smart approach to systematically retrieve apparently complete, intact, and recently functional Ty3/gypsy-like non-LTR retrotransposons that, next to their characteristic gag and pol genes, additionally carry sequences that are homologous to viral glycoprotein F (env-F) or viral glycoprotein B (env-B). They then construct and compare phylogenetic trees of the host species and individual encoded proteins and protein domains, where 3D-structure calculations and other features explain and corroborate the clustering within the phylogenetic trees. Congruence of phylogenetic trees and correlation of structural features is then taken as evidence for an infrequent recombination and a long-term co-evolution of the reverse transcriptase (encoded by the pol gene) and its respective putative membrane fusion gene (encoded by env-F or env-B). Importantly, the env-F and env-B containing retrotransposons do not form a monophyletic group among the Ty3/gypsy-like non-LTR retrotransposons, but are scattered throughout, supporting the idea of an originally ancient association followed by a random loss of env-F/env-B in individual branches of the tree (and rather rare re-associations via more recent recombinations).
Overall, this is valuable, stimulating, and important work of general and fundamental interest, but still also somewhat incompletely explored, imprecisely explained, and insufficiently put into context for a more general audience.
Weaknesses:
Some points that might be considered and clarified:
(1) Imprecise explanations, terms, and definitions:
It might help to add a 'definitions box' or similar to precisely explain how the authors decided to use certain terms in this manuscript, and then use these terms consistently and with precision.
a) In particular, these are terms such as 'vertebrate retrovirus' vs 'retrovirus' vs 'endogenized retrovirus' vs 'endogenous retrovirus' vs 'non-LTR retrotransposon' and 'Ty3/gypsi-like retrotransposon' vs 'Ty3/gypsy retrotransposon' vs 'errantivirus'.
b) The comment also applies to the term 'env' used for both 'env-F' and 'env-B', where often it remains unclear which of the two protein types the authors refer to. This is confusing, particularly in the methods, where the search for the respective homologs is described.
c) Other examples are the use of the entire pol gene vs. pol-RT for the definition of the Ty3/gypsy clade and for the generation of phylogenetic trees (Methods and Figure S1), and the names for various portions of pol that appear without prior definition or explanation (e.g., 'pro' in Figure 1A, 'bridge' in Figure S1C, 'the chromodomain' in the text and Figure 7).
d) It is unclear from the main text which portions of pol were chosen to define pol-RT and why. The methods name the 'palm-and-fingers', 'thumb', and 'connections' domains to define RT. In the main text, the 'connection' domain is called 'tether' and is instead defined as part of the 'bridge' region following RT, which is not part of RT.
(2) Insufficient broader context:
a) The introduction does not state what defines Ty3/gypsy non-LTR retrotransposons as compared to their closest relatives (Ty1/copia retrotransposons, BEL/pao retrotransposons, vertebrate retroviruses). This makes it difficult to judge the significance and generality of the findings.
b) The various known compositions of Ty3/gypsi-like retrotransposons are not mentioned and explained in the introduction (open reading frames, (poly-)proteins and protein domains, and their variable arrangement, enzymatic activities, and putative functions), and the distribution of Ty3/gypsi-like retrotransposons among eukaryotes remains unclear. The introduction does not mention that Ty3/gypsi-like retrotransposons apparently are absent from vertebrates, and Figure 7 is not very clear about whether or not it includes sequences from plants ('Chromoviridae').
c) The known association of Ty3/gypsi-like retrotransposons from different metazoan phyla with putative membrane fusion proteins (env-like) genes is mentioned in the introduction, but literature information, whether such associations also occur in the context of other retrotransposons (e.g., Ty1/ copia or BEL/pao), is not provided. The abstract is somewhat misleading in this respect. Finally, the different known types of env-like genes are not mentioned and explained as part of the introduction ('env-f', 'env-B', 'retroviral env', others?)
d) Some key references and reviews might be added:
- Pelisson, A. et al. (1994) https://www.embopress.org/doi/abs/10.1002/j.1460-2075.1994.tb06760.x
(next to Song et al. (1994), for the identification of env in Ty3/gypsy)- Boeke, J.D. et al. (1999)
In Virus Taxonomy: ICTV VIIth report. (ed. F.A. Murphy),. Springer-Verlag, New York.
(cited by Malik et al. (2000) - for the definition and first use of the term 'errantivirus')- Eickbush, T.H. and Jamburuthugoda, V.K. (2008) https://doi.org/10.1016/j.virusres.2007.12.010
(on the classification of retrotransposons and their env-like genes)- Hayward, A. (2017) https://doi.org/10.1016/j.coviro.2017.06.006
(on scenarios of env acquisition)(3) Incomplete analysis:
a) Mobile genetic elements are sometimes difficult to assemble correctly from short-read sequencing data. Did the authors confirm some of their newly identified elements by e.g., PCR analysis or re-identification in long-read sequencing data?
b) The authors mention somewhat on the side that there are Ty3/gypsy elements with a different arrangement (gag-env-pol instead of gag-pol-env). Why was this important feature apparently not used and correlated in the analysis? How does it map on the RT phylogenetic tree? Which type of env is found with either arrangement? Is there evidence for a loss of env also in the case of gag-env-pol elements?
c) Sankey plots are insufficiently explained. How would inconsistencies between trees (recombinations) show up here? Why is there no Sankey plot for the analysis of env-B in Figure 5?
d) Why are there no trees generated for env-F and env-B like proteins, including closely related homologous sequences that do NOT come from Ty3/gypsy retrotransposons (e.g., from the eukaryotic hosts, from other types of retrotransposons (Ty1/copia or BEL/pao), from viruses such as Herpesvirus and Baculovirus)? It would be informative whether the sequences from Ty3/gypsy cluster together in this case.
e) Did the authors identify any other env-like ORFs (apart from env-F and env-B) among Ty3/gypsy retrotransposons? Did they identify other, non-env-like ORFs that might help in the analysis? It is not quite clear from the methods if the searches for env-F and env-B - containing Ty3/gypsy elements were done separately and consecutively or somehow combined (the authors generally use 'env', and it is not clear which type of protein this refers to).
f) Why was the gag protein apparently not used to support the analysis? Are there different, unrelated types of gag among non-LTR retrotransposons? Does gag follow or break the pattern of co-evolution between RT and env-F/env-B?
g) Data availability. The link given in the paper does not seem to work (https://github.com/RippeiHayashi/errantiviruses_2025/tree/main). It would be useful for the community to have the sequences of the newly identified Ty3/gypsy retrotransposons listed readily available (not just genome coordinates as in table S1), together with the respective annotations of ORFs and features.
-
Author response:
We appreciate thorough and highly valuable feedback from the reviewers. We will take their suggestions on board and prepare a revised manuscript focusing on the following points:
(1) As reviewers pointed out, we did not evaluate horizontal transfer events of env-containing Ty3/gypsy elements. We consistently observed that elements found in the same phylum/class/superfamily cluster together in the POL phylogenetic tree, suggesting an ancient acquisition of env to the Ty3/gypsy elements—separation should not be as clear as we observed should they had been frequently gained from animals across different phylum/class/superfamilies. However, this does not exclude more recent horizontal transfer events that may occur between closely related species. We will perform gene-tree species-tree reconciliation analyses in clades that …
Author response:
We appreciate thorough and highly valuable feedback from the reviewers. We will take their suggestions on board and prepare a revised manuscript focusing on the following points:
(1) As reviewers pointed out, we did not evaluate horizontal transfer events of env-containing Ty3/gypsy elements. We consistently observed that elements found in the same phylum/class/superfamily cluster together in the POL phylogenetic tree, suggesting an ancient acquisition of env to the Ty3/gypsy elements—separation should not be as clear as we observed should they had been frequently gained from animals across different phylum/class/superfamilies. However, this does not exclude more recent horizontal transfer events that may occur between closely related species. We will perform gene-tree species-tree reconciliation analyses in clades that have enough elements and represented species to estimate the frequency of horizontal transfer events.
(2) We did not find env-containing Ty3/gypsy elements in some animal phyla such as Echinodermata and Porifera, but this could be due to the quality or number of available genome assemblies as reviewers suggested. To address this, we will mine GAG-POL gypsy elements in the genomes that were devoid of GAG-POL-ENV elements and compare their abundance with other genomes that carry GAG-POL-ENV elements. If GAG-POL gypsy elements were similarly abundantly identified, that would indicate that the observed absence of GAG-POL-ENV elements is not due to poor quality of genome assemblies.
(3) We will include F-type and HSV-gB type ENV proteins from known viruses in the phylogenetic analysis to investigate their ancestry and potential recombination events with env-containing Ty3/gypsy elements.
(4) Wherever relevant, we will clarify the terms using in the manuscript, provide rationale to our selection of POL domains used for structural and phylogenetic analyses, improve accessibility of figures, touch on gypsy elements in vertebrates, and make sure all concepts covered in the results are sufficiently introduced in the introduction.
-
-
-