Accumulated metagenomic studies reveal recent migration, whole genome evolution, and undiscovered diversity of orthomyxoviruses

Metagenomic studies have uncovered many novel viruses by looking beyond hosts of public health or economic interest. However, the resulting viral genomes are often incomplete, and analyses largely characterize the distribution of viruses over their dynamics. Here, we integrate accumulated data from metagenomic studies to reveal geographic and evolutionary dynamics in a case study of Orthomyxoviridae , the RNA virus family that includes influenza virus. First, we use sequences of the orthomyxovirid Wǔhàn mosquito virus 6 to track the migrations of its host. We then look at orthomyxovirus genome evolution, finding gene gain and loss across members of the family, especially in the surface proteins responsible for cell and host tropism. We find that the surface protein of Wǔhàn mosquito virus 6 exhibits accelerated non-synonymous evolution suggestive of antigenic evolution, i.e., vertebrate infection, and belongs to a wider quaranjavirid group bearing highly diverged surface proteins. Finally, we quantify the progress of orthomyxovirus discovery and forecast that many diverged Orthomyxoviridae members remain to be found. We argue that continued metagenomic studies will be fruitful for understanding the dynamics, evolution, ecology of viruses, and their hosts, regardless of whether novel viruses are identified or not, as long as study designs allowing for the resolution of complete viral genomes are employed.


The number of known virus species has increased dramatically through metagenomic studies, which search genetic material sampled from a host for non-host genes. Here, we focus on an important viral family that includes influenza viruses, the Orthomyxoviridae , with over 100 recently discovered viruses infecting hosts from humans to fish. We find that one virus called Wǔhàn mosquito virus 6, discovered in mosquitoes in China, has spread across the globe very recently. Surface proteins used to enter cells show signs of rapid evolution in Wǔhàn mosquito virus 6 and its relatives which suggests an ability to infect vertebrate animals. We compute the rate at which new orthomyxovirus species discovered add evolutionary history to the tree of life, predict that many viruses remain to be discovered, and discuss what appropriately designed future studies can teach us about how diseases cross between continents and species.

  1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons

    Reply to the reviewers

    Reviewer #1 (Evidence, reproducibility and clarity (Required)):

    Summary: The authors use an unclassified quaranjavirus, Wǔh�n mosquito virus 6 (WuMV-6), to demonstrate the possibility of orthomyxvirid global transmission dynamic analyses. The focused surface protein analysis strongly indicates a vertebrate host for WuMV-6 in addition to the insect host. The analysis is then expanded to other quaranjaviruses, which differ considerably in their surface glycoproteins, indicating a complex evolution. Finally, the authors scientifically demonstrate that orthomyxovirids are undersampled and hence that this family will have to expand considerably in the future.

    Major comments: none

    We thank the reviewer for a succinct summary of our study and we are very glad the key messages were sufficiently clear.

    Minor comments: The article lacks precision and hence some global edits are in order. Generally:

    1. For clarity to the reader, please introduce the family Orthomyxoviridae, i.e., its current official composition (i.e., 9 genera, 21 species, and 22 viruses) so the reader is not confused by terms such as "quaranjavirus" or "isavirus" etc.).

    This is a fair request though we would prefer to err on the side of caution with regards to the precise number of taxonomic ranks given the flux viral taxonomy has experienced and in light of the deluge of new taxa being discovered all the time. We refer to the “traditional” view of orthomyxovirid taxonomy at the genus level, encompassing the genera described up until 2011.

    After that, please clearly indicate which viruses are classified and which ones are not. For instance, the main virus dealt with in this paper is unclassified, and so are Astopletus and Ūsinis viruses.

    We do not think this is reasonable since virtually all RNA viruses discussed in the text are not classified and their status as such has little bearing on any of our findings.

    Please ensure correct spelling, including diacritics, of the viruses and abbreviations throughout: Wǔh�n mosquito virus 6 (WuMV-6); H�běi orthomyxo-like virus 2 [note the deletion of one "virus"]; Wēnlǐng orthomyxo-like virus 2

    Thank you for the comment, we have added the diacritics where we could identify them but may have missed some.

    For orientation of the reader, please refer to family groups of viruses as -virids (e.g., "orthomyxovirids", "human coronavirids", "some rhabdovirids"). This way it is clear to the reader that, for instance, "quaranjaviruses" refers to a genus-level group

    Thank you, we agree that this adds much needed precision in terminology.

    "influenza" is a disease. There are several viruses that can cause influenza; they belong to four different genera. Please scan for "influenza" and replace each either with a virus name (for instance, in the abstract, "...RNA viruses containing influenza A virus" or with a genus name (e.g., "alphainfluenzaviruses")

    Our apologies for that misnomer. The text has been corrected.

    Please ensure the differentiation of taxa (concepts), such as species, and viruses (things). Orthomyxoviridae cannot infect anything, it can also not be sampled etc. Orthomyxovirids, the physical members of Orthomyxoviridae can infect things. Most instances of "Orthomyxoviridae" should be replaced accordingly.

    Thank you for the comment, this has been corrected as suggested.

    In particular:

    1. The title doesn't make much sense. Orthomyxovirids are not taxonomically incomplete - they are things that we simply may not have samples or may have characterized incompletely. Also, the analyses are largely restricted to quaranjaviruses. Hence, I would suggest "...genome evolution, and broad diversity of quaranjaviruses"

    Our apologies for the confusion. The analyses we carried out to quantify evolutionary orthomyxovirid diversity likely waiting to be discovered was carried out on all known (at the time) members of ____Orthomyxoviridae____ and thus the title must still refer to the entire family rather than quaranjavirids. We felt that the term “taxonomic incompleteness” imparts on the reader exactly what the reviewer refers to, namely that new taxonomic ranks are likely to come as more evolutionary diversity gets uncovered. Alternative and more precise formulations, like referring to evolutionary incompleteness or something similar, would miss the fact that it is taxonomy that discretises the otherwise continuous evolutionary change.

    Abstract: genomes are not employed and do not make money. Please replace "employed" with "used"

    We have to respectfully disagree since the definition of the word “employ” also includes the meaning “to make use of”.

    Re: point 6 above, Introduction: species/families etc. cannot be discovered. They are being established by people for viruses that may be discovered. Please fix here and elsewhere (in most cases, "species" should be replaced with "viruses")

    We agree that taxonomic ranks are designated and not discovered and have changed the text accordingly.

    P3, second paragraph: please place "jingmenviruses" in quotation marks as this is not an official term (yet). Please add "potentially" ("as potentially causing human disease"). Even the authors only speak of an "association" and do not fulfill Koch's postulates

    __We have to respectfully disagree here too. “Jingmenviruses” as a term is unambiguous in referring to a group of related segmented flaviviruses even though the groups is not officially assigned a taxonomic rank. We have altered the text to add uncertainty to the claim that jingmenviruses cause disease in humans. __

    P3, top right column: "e.g., the tick-borne Johnston Atoll quaranja- and thogotoviruses" is ambiguous. Please change to "e.g., the tick-borne quaranja- and thogotoviruses" or list particular viruses and clarify which belong to which genus

    Apologies for the confusion. We fixed this instance.

    P3, right column "smaller number" - change to "lower number"

    We have altered the offending sentence in response to reviewer 2 and this combination of words is no longer present.

    P3, right column "or only the polymerase" - makes no sense to the reader as it has not been introduced; and grammatically needs to be improved as the polymerase is also encoded on a segment. Likewise, PB1 makes no sense to unacquainted reader - maybe add a few sentences to the intro right after the family introduction on general genome composition and that PB1 is part of the polymerase holoenyzme?

    We have altered the offending sentence in response to reviewer 2 but we take the point. We’ve added detail about the RNA-directed RNA polymerase of orthomyxovirids to the introduction.

    P4: the Ebola virus glycoprotein is called GP1,2 [with 1,2 in subscript] (also Figure 2 legend)

    Respectfully, while the reviewer is technically correct in that the glycoprotein of Ebola virus is referred to as GP_1,2 in proteomics literature (the 1,2 referencing the protein held together by a cysteine bridge post-cleavage), calling it GP is not out of place in evolutionary studies and the term “Ebola virus GP” is unambiguous to the reader.

    P4: please change "West Africa" to "Western Africa" (the designation of the area by the UN)

    Unfortunately, while we agree that the reviewer is correct in that the UN refers to the region as “Western Africa”, references to the “West African Ebola virus epidemic” are ubiquitous in the literature and thus we do not see the reason to change the term here either.

    P6: change "with Rainbow / Steelhead trout orthomyxviruses" to "with mykissviruses (rainbow trout orthomyxovirus and steelhead trout orthomyxovirus)" [note that virus names are not capitalized except for proper noun components; hence also "infectious salmon anemia virus, bottom right column]

    While we recognise that viruses related to infectious salmon anaemia virus discovered in trout have received a separate taxonomic designation we feel very strongly about not mentioning it in our manuscript. Our fear is that “mykissviruses” have been designated too hastily on the basis of a handful of representatives and that relatives discovered in the future may show an indiscernible continuum between “mykissviruses” and isaviruses, invalidating the former as a valid term. We would therefore strongly prefer to keep references to specific viruses rather than a taxonomic designation that may disappear so that a future reader may have an easier time with our study.

    P6, right column: please change "RNA-dependent" to the IUPAC/IUB-correct "RNA-directed"


    Figure 2 is too small. I could not figure out B with or without my confocals... Likewise S2, S3 are way too small. In Fig 2 legend, please place "spike" into lower case

    We understand the reviewer’s concern here but Figure 2B was a compromise between vertical space available on a page, the number of taxa in the PB1 tree, and what we thought important to communicate - the variation in segment number across orthomyxoviruses and mapping of PB1 diversity to gp64 diversity. This was done at the expense of individual taxon name visibility whilst fully zoomed out. To remedy this Figure 2B was rendered in 300 dpi resolution such that zooming in will show individual taxon names clearly. We ultimately hope to publish our study in an online-only journal where printing will not present an issue. Likewise for figures S2 and S3. We have changed “Spike” to be lower case in the legend.

    Figure 3: correct spelling of virus names (from top to bottom): rainbow trout orthomyxovirus, infectious salmon anemia virus, influenza C virus, influenza D virus, influenza A virus, influenza B virus, Wēnlǐng orthomyxo-like virus 2, Dhori virus, Thogoto virus, Jos virus, Aransas Bay virus, ... Johnston Atoll virus, Quaranfil virus, H�běi orthomyxo-like virus 2, Hǎin�n orthomyxo-like virus 2, Wǔh�n mosquito virus 6. Also apply to S6 and others where applicable.

    The names for viruses in Figure 3 were taken directly from their NCBI records and since we do not show their accessions there is no other way to disambiguate them to the reader. We have, however, added the necessary diacritics where appropriate.

    [PS: based on the somewhat backward, non-UNICODE editorial manager system, I am worried that the diacritics in virus names above are not rendered corretly. If so, please look up the Pinyin spelling of Wuhan, Hainan, Wenling etc. - easiest way is to search Wikipedia for the terns and then identify the Pinyin spelling, which is typically pointed out]


    I think we (all reviewers) are all largely in agreement - this is a very useful study; the manuscripts just needs various adjustments. I agree with the requests of the other two reviewers.

    Reviewer #1 (Significance (Required)):

    The strength of the paper is that it provides a road map on how undersampled taxa may be analyzed and which kind of information can be gleaned from these analyses. The paper also demonstrates that the analysis of seemingly "unimportant" viruses can prove important. The limitation of the paper is that there is no true novel revelation here. The sampling sites of WuMV-2 GenBank records already suggest broad distribution, which often goes along with sequence diversity; the continued discovery of orthomyxovirids in metagenomic studies clearly implied undersampling (but it is nice to have this "gut feeling" scientifically fortified now). The paper is useful for evolutionary virologists, virus taxonomists, orthomyxovirid specialists, and invertebrate virologists.

    __We respectfully disagree with the reviewer and believe they may have missed an important point raised by our study. We do not claim that a global distribution of WuMV6 is what makes it remarkable but that its sampled diversity is 1) sufficient to calibrate molecular clocks (in our experience this is not always the case for arthropod viruses) and 2) that WuMV6 has reached its current global distribution ____recently____. __

    Reviewer #2 (Evidence, reproducibility and clarity (Required)):

    This is a nice example of bringing together a variety of data from metatranscriptomic studies to answer fundamental evolutionary questions in the field of viral evolution. There is a focus on a single virus family, and although some might see this as a little restrictive, I think the 'deep-dive' presented in this paper leaves space for a relatively detailed and comprehensive analysis. No doubt, other studies will gain inspiration from the approach presented here and expand this work to other viral groups.

    Overall, the paper is very well written, and the figures are of a very high quality. It is a shame that there are only 3 main figures in the paper because the supplementary figures are well presented and informative.

    We thank the reviewer for the kind words.

    The manuscript discusses the importance of host quite a bit, and for that reason it would have been nice to try and incorporate the host of the various viruses into the figures somehow (perhaps as a supplementary, since the trees are already quite busy). This might help orientate the reader).

    While we appreciate that host information is of interest, we foresee several issues. For one, we refer to broad host classes (essentially arthropod versus vertebrate) because they are largely determined by membrane fusion protein classes, the actual focus of our study, which exhibit strong phylogenetic signal. Secondly, host information in metagenomic studies can be imprecise, incorrect or unavailable.

    I have some minor comments or suggestions for the authors to consider below. Note, please use line numbers in the future for your submissions.

    A paragraph in the discussion laying out the limitations of this approach would be useful to the reader and would make this excellent paper even more robust.

    Thank you for the suggestion. We presume the reviewer is referring to our interpolation of orthomyxovirid diversity and included a few sentences about the limitations of this approach in the Discussion.

    Pg 3. The sentence starting 'The vast majority of known orthomyxoviruses use one...' should be made into two sentences to make it easier to read. A second sentence for the arthropod description is the obvious edit.

    We appreciate the suggestion and have included it in the manuscript.

    Pg 3. 'The number of segments of orthomyxoviruses with genomes known to be complete varies from 6 to 8'. Rephrase to - 'Orthomyxoviruses genomes are known to have 6-8 segments, but many metagenomically discovered viruses in this group have incomplete genomes...etc...',

    Thank you for the suggestion, it has been included.

    Figure 1 - what do the white triangles mean? Are these the directions of reassortment? This should be explained in the legend...

    We apologise for the omission, this is now explained.

    New Zealand is covered up by the circular tree. It looks like there is a point which is partially obscured.

    The reviewer spotted a mistake on our part here. The figure included the coordinates for Wellington, New Zealand when the detection was actually in Wellington Shire, Australia. This has been fixed.

    PD analysis - t I think you assume that viruses are static in this analysis. As we all know, they continue to mutate and eventually new species will evolve. Is it possible to consider the mutation rate in this analysis and the evolution of new variants/ eventually leading to new species? It might be complicated, and maybe a matter for future work, but it might be worth discussing this as a limitation at the very least. Especially when extrapolating to the future (although you do not extrapolate too far, so maybe this is not an issue here...). You could choose to discuss this in relation to the bird analogy (which was great), and compare the rate of mutation which will lead to the evolution of new species on a totally different time scale.

    We appreciate the point raised by the reviewer and while we wholly agree that the possibility of new viral taxa arising over time is an important caveat, we felt the discussion ends up being rather short. On one hand taxa definitions for different viral groups can be different, and on the other speciation in RNA viruses is difficult to place in absolute time because of a phenomenon called time-dependence of evolutionary rates. Methods accounting for the latter using sophisticated models or external calibration points would seem to imply that speciation timescales exceed those of research.

    Discussion: When discussing the hypothesis that WMV6 diversity is a result of repeat exposure to vertebrate hosts, can you also describe the alternative hypothesis here, and why the evidence leads you to put more weight on the former.

    This is a fair question and we have mentioned an alternative hypothesis in the discussion that’s been brought up by our colleagues before. It’s a hypothesis that alternating between different hosts induces divergent selection pressures on gp64. We contend that since gp64 proteins are thought to use a highly conserved host receptor (NPC1) we think it likely that no major changes are required when switching hosts. We are open to discussing other alternatives if the reviewer has suggestions.


    Seems like we are all in agreement and that after some minor adjustments this will be an excellent contribution.

    Reviewer #2 (Significance (Required)):

    Please see my review above. I did not use your formatting suggestions since I only saw it upon completing my review.

    Reviewer #3 (Evidence, reproducibility and clarity (Required)):


    This manuscript describes the use of data from metagenomic analyses to make inferences about the evolutionary and geographic history of the Orthomyxoviridae family of viruses and their hosts. Data from Wuhan Mosquito Virus 6 (WMV6) derived from various RNA-seq analyses is used to analyse loss and gain of virus segments over time, the time since the last common ancestor of these segments and the selection pressure acting on different genes. These results are used to hypothesise about which species have vectored this virus in the past and their geographic distribution. The additional phylogenetic diversity provided by characterisation of additional viruses of this species is quantified and projected into the future to demonstrate the value of further work in this area. The study also demonstrates more generally the benefit of additional sequencing and of characterising viruses in metagenomic datasets, even in cases where novel viruses are not identified.

    Major Comments

    The methodology in this manuscript appears to be sound and the results support the conclusions. Appropriate and detailed analyses have been performed and are described in detail. Code is provided to allow the results to be reproduced. The figures are informative and very well presented. I do not think any additional analyses are required.

    We thank the reviewer for the kind words.

    Minor Comments

    The manuscript is a little hard to follow in places. I think a brief introduction of WHV6 in the introduction section would help with this - where has it been isolated previously and what is known about its evolutionary history (if anything), how is it related to other Orthomyxoviruses. This information is included later but it would improve the flow of the paper to include it in the introduction.

    We apologise for the inconvenience and agree with the reviewer. We have improved the flow of the manuscript per reviewer suggestion.

    I think including a little more about the Method in the Results section would also be helpful, to save the reader jumping back and forth in order to understand the results. For example, at the beginning of the results section, briefly detailing how many samples were included, their broad geographic location and what the analysis is intended to show (e.g. "three full length sequences isolated from China, seven from Australia [...], between 1995 and 2019, were used to generate a reassortment network, in order to show.....") would be helpful. Each of the subsections of the Results would benefit from something similar.

    Apologies for the lack of clarity on our part. We have added more methodological information to each section in the results.

    Although it is clear in the Materials and Methods which datasets have been included, it is less apparent why these were selected. For example, in Figure 1A there are five countries listed - are these countries for which a particularly large amount of full length sequences were available or for which any full length sequence is available? Similarly, for Figure 1B, are these all of the countries where a dataset has originated containing any segment of WHV6?

    The confusion is entirely our fault as we have clearly not provided sufficient detail. This has been fixed now by explaining this better in the methods and Figure 1 legend.

    In the Discussion, it is stated that the frequency and fast evolution of WMV6 place it uniquely to enable tracking of mosquito populations, however there is no evidence presented to support this - does WMV6 evolve faster or occur more frequently than other mosquito RNA viruses?

    __Our apologies for the jump in logic. We now expand on what we meant by the following sentence in the discussion: __“In our experience, metagenomically discovered RNA viruses can be rare or, when encountered often, do not always contain sufficient signal to calibrate molecular clocks (Webster et al. 2015).”


    I also agree with the requests of the other two reviewers and that the manuscript will be in great shape once these are included.

    Reviewer #3 (Significance (Required)):

    This manuscript is very interesting, for the specific results presented here but, more importantly, in opening up further avenues for investigation. The study provides a proof of concept for using viruses derived from metagenomic data for specific and detailed evolutionary and ecological analyses of a single species. The scope of the analysis performed on WMV6 is not particularly broad, but it differs from the typical analysis of viruses in metagenomic datasets, which tends to focus on identification and characterisation of novel viruses only. I believe that this work is valuable to others working in the field, reveals additional potential in existing data and could provide inspiration for many future studies. To my knowledge, it is one of the first studies to focus on a single, fairly under-studied virus, and draw ecological conclusions based on only bioinformatic analyses.

    I think the results presented here for WMV6 may be of interest to a specialised audience, but that the manuscript overall is valuable to a broad audience, including ecologists, evolutionary biologists and virologists conducting fundamental science research.

    We appreciate the reviewer’s kind words.

