Heterogeneity of Genetic Sequence within Quasi-species of Influenza Virus Revealed by Single-Molecule Sequencing
Curation statements for this article:-
Curated by eLife
eLife Assessment
This is a valuable methodological contribution towards accurate characterization of viral genetic diversity using long-read sequencing and unique molecular identifiers (UMIs). However, the methods are currently incomplete and the sensitivity is not rigorously demonstrated. Addressing these gaps would strengthen the manuscript and make it a key addition to the field.
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (eLife)
Abstract
Abstract
Influenza viruses are characterized by high mutation rates and extensive genetic diversity, which hinder effective vaccine development and facilitate immune evasion (Taubenberger & Morens, 2006; Barr et al., 2010). These mutations primarily arise from the error-prone activity of the viral RNA-dependent RNA polymerase, generating highly heterogeneous viral populations within individual hosts. This phenomenon aligns with the quasi-species model, in which a cloud of related viral genomes evolves under selective pressures (Domingo et al., 2012). Accurate characterization of this intra-host diversity is crucial for understanding viral evolution and informing future vaccine design. However, conventional RNA sequencing technologies often fail to reliably detect low-frequency variants due to technical errors introduced during sample preparation and sequencing steps.
In this study, we implemented a single unique molecular identifier (sUMI) approach to minimize sequencing artifacts and achieve an error rate of approximately 10−5. This high-resolution method enabled precise quantification of quasi-species diversity from influenza virus populations isolated at the single-particle level. Comparative analyses revealed mutation frequencies well above background error levels, confirming that the observed variation was of biological origin.
Furthermore, application of information-theoretic metrics such as Shannon entropy and Jensen-Shannon divergence demonstrated that the mutation distribution was non-random, suggesting the presence of selective constraints.
Our findings establish a robust framework for studying intra-host viral evolution and provide critical insights that may enhance AI-driven prediction of mutational trajectories and support more effective influenza vaccine strategies.
Article activity feed
-
-
-
eLife Assessment
This is a valuable methodological contribution towards accurate characterization of viral genetic diversity using long-read sequencing and unique molecular identifiers (UMIs). However, the methods are currently incomplete and the sensitivity is not rigorously demonstrated. Addressing these gaps would strengthen the manuscript and make it a key addition to the field.
-
Reviewer #1 (Public review):
Tamao et al. aimed to quantify the diversity and mutation rate of the influenza (PR8 strain) in order to establish a high-resolution method for studying intra-host viral evolution. To achieve this, the authors combined RNA sequencing with single-molecule unique molecular identifiers (UMIs) to minimize errors introduced during technical processing. They proposed an in vitro infection model with a single viral particle to represent biological genetic diversity, alongside a control model using in vitro transcribed RNA for two viral genes, PB2 and HA.
Through this approach, the authors demonstrated that UMIs reduced technical errors by approximately tenfold. By analyzing four viral populations and comparing them to in vitro transcribed RNA controls, they estimated that ~98.1% of observed mutations originated …
Reviewer #1 (Public review):
Tamao et al. aimed to quantify the diversity and mutation rate of the influenza (PR8 strain) in order to establish a high-resolution method for studying intra-host viral evolution. To achieve this, the authors combined RNA sequencing with single-molecule unique molecular identifiers (UMIs) to minimize errors introduced during technical processing. They proposed an in vitro infection model with a single viral particle to represent biological genetic diversity, alongside a control model using in vitro transcribed RNA for two viral genes, PB2 and HA.
Through this approach, the authors demonstrated that UMIs reduced technical errors by approximately tenfold. By analyzing four viral populations and comparing them to in vitro transcribed RNA controls, they estimated that ~98.1% of observed mutations originated from viral replication rather than technical artifacts. Their results further showed that most mutations were synonymous and introduced randomly. However, the distribution of mutations suggested selective pressures that favored certain variants. Additionally, comparison with a closely related influenza strain (A/Alaska/1935) revealed two positively selected mutations, though these were absent in the strain responsible for the most recent pandemic (CA01).
Overall, the study is well-designed, and the interpretations are strongly supported by the data. However, the following clarifications are recommended:
(1) The methods section is overly brief. Even if techniques are cited, more experimental details should be included. For example, since the study focuses heavily on methodology, details such as the number of PCR cycles in RT-PCR or the rationale for choosing HA and PB2 as representative in vitro transcripts should be provided.
(2) Information on library preparation and sequencing metrics should be included. For example, the total number of reads, any filtering steps, and quality score distributions/cutoff for the analyzed reads.
(3) In the Results section (line 115, "Quantification of error rate caused by RT"), the mutation rate attributed to viral replication is calculated. However, in line 138, it is unclear whether the reported value reflects PB2, HA, or both, and whether the comparison is based on the error rate of the same viral RNA or the mean of multiple values (as shown in Figure 3A). Please clarify whether this number applies universally to all influenza RNAs or provide the observed range.
(4) Since the T7 polymerase introduced errors are only applied to the in vitro transcription control, how were these accounted for when comparing mutation rates between transcribed RNA and cell-culture-derived virus?
(5) Figure 2 shows that a UMI group size of 4 has an error rate of zero, but this group size is not mentioned in the text. Please clarify.
-
Reviewer #2 (Public review):
Summary:
This manuscript presents a technically oriented application of UMI-based long-read sequencing to study intra-host diversity in influenza virus populations. The authors aim to minimize sequencing artifacts and improve the detection of rare variants, proposing that this approach may inform predictive models of viral evolution. While the methodology appears robust and successfully reduces sequencing error rates, key experimental and analytical details are missing, and the biological insight is modest. The study includes only four samples, with no independent biological replicates or controls, which limits the generalizability of the findings. Claims related to rare variant detection and evolutionary selection are not fully supported by the data presented.
Strengths:
The study addresses an important …
Reviewer #2 (Public review):
Summary:
This manuscript presents a technically oriented application of UMI-based long-read sequencing to study intra-host diversity in influenza virus populations. The authors aim to minimize sequencing artifacts and improve the detection of rare variants, proposing that this approach may inform predictive models of viral evolution. While the methodology appears robust and successfully reduces sequencing error rates, key experimental and analytical details are missing, and the biological insight is modest. The study includes only four samples, with no independent biological replicates or controls, which limits the generalizability of the findings. Claims related to rare variant detection and evolutionary selection are not fully supported by the data presented.
Strengths:
The study addresses an important technical challenge in viral genomics by implementing a UMI-based long-read sequencing approach to reduce amplification and sequencing errors. The methodological focus is well presented, and the work contributes to improving the resolution of low-frequency variant detection in complex viral populations.
Weaknesses:
The application of UMI-based error correction to viral population sequencing has been established in previous studies (e.g., in HIV), and this manuscript does not introduce a substantial methodological or conceptual advance beyond its use in the context of influenza.
The study lacks independent biological replicates or additional viral systems that would strengthen the generalizability of the conclusions. Potential sources of technical error are not explored or explicitly controlled. Key methodological details are missing, including the number of PCR cycles, the input number of molecules, and UMI family size distributions. These are essential to support the claimed sensitivity of the method.
The assertion that variants at {greater than or equal to}0.1% frequency can be reliably detected is based on total read count rather than the number of unique input molecules. Without information on UMI diversity and family sizes, the detection limit cannot be reliably assessed.
Although genetic variation is described, the functional relevance of observed mutations in HA and NA is not addressed or discussed in the context of known antigenic or evolutionary features of influenza. The manuscript is largely focused on technical performance, with limited exploration of the biological implications or mechanistic insights into influenza virus evolution.
The experimental scale is small, with only four viral populations derived from single particles analyzed. This limited sample size restricts the ability to draw broader conclusions about quasispecies dynamics or evolutionary pressures.
-