Redefining De Novo Gammaherpesvirus Infection Through High-Dimensional, Single-Cell Analysis of Virus and Host

Abstract

Virus infection is frequently characterized using bulk cell populations. How these findings correspond to infection in individual cells remains unclear. Here, we integrate high-dimensional single-cell approaches to quantify viral and host RNA and protein expression signatures using de novo infection with a well-characterized model gammaherpesvirus. While infected cells demonstrated genome-wide transcription, individual cells revealed pronounced variation in gene expression, with only 9 of 80 annotated viral open reading frames uniformly expressed in all cells, and a 1000-fold variation in viral RNA expression between cells. Single-cell analysis further revealed positive and negative gene correlations, many uniquely present in a subset of cells. Beyond variation in viral gene expression, individual cells demonstrated a pronounced, dichotomous signature in host gene expression, revealed by measuring host RNA abundance and post-translational protein modifications. These studies provide a resource for the high-dimensional analysis of virus infection, and a conceptual framework to define virus infection as the sum of virus and host responses at the single-cell level.

HIGHLIGHTS

CyTOF and scRNA-seq identify wide variation in gene expression between infected cells.
Host RNA expression and post-translational modifications stratify virus infection.
Single cell RNA analysis reveals new relationships in viral gene expression.
Simultaneous measurement of virus and host defines distinct infection states.

This manuscript is in revision at eLife

The decision letter after peer review, sent to the authors on October 28 2020, follows.

Summary

This work by Berger et al examined the process of De Novo infection by a model gamma herepesvirus, MHV68, using two complementary single cell approaches - CyTOF and scRNAseq. Using CyTOF and scRNA-seq, they characterize host and viral expression of protein and RNA during infection by the gammaherpesvirus MHV68. From CyTOF of numerous host proteins and one viral protein, they propose that the DNA damage marker pH2AX along with the viral protein vRCA are more precise indicators of progressive infection than a standard LANA reporter. Using a single viral (ORF18) and host RNA (Actin), they demonstrate that pH2AX+, vRCA+ cells uniformly express ORF18. To more closely examine viral RNAs, they performed scRNA-seq on infected cells and observe a high level of heterogeneity in viral gene expression.

The manuscript is very well written and could potentially be a very welcome addition to the growing field of single cell virology. However, some concerns were raised regarding some of the conclusions and validation of the results. In particular, the variability in gene expression does not fall into existing models of kinetically regulated waves of viral transcription. This and their previous work convincingly argue that bulk measurements of protein and RNA are insufficient to represent the complexity of de novo MHV68 infection. However, in the absence of functional significance to the many clusters identified the impact of the conclusions is limited. With regard to validation, the authors must also consider that inherent variability in scRNAseq technology that could complicate the accurate measure of viral RNA. This should be discussed and addressed with additional data and/or experiments (see below).

Essential Revisions

The reviewers agreed that this article will be a very useful resource for the single cell virology community, but require further validation to realize that potential. As such, this article should be resubmitted as a "Tools and Resources" article. Furthermore, this revision should pay careful attention to the additional essential revisions that follow this point, in particular there are areas that require more data for validation. Ideally, existing data or experiments closely related to those conducted can be used.
One of the more dramatic conclusions from the paper is that while the median infected cell expressed 52 viral genes, this ranges from 12 to 66 with only a handful of genes expressed uniformly. However, there are a number of indications that this may be explained instead by the stochastic failure to detect lowly expressed viral genes: 1) Figure 1A shows a tight distribution of the # of viral genes detected, which would be unlikely if there were multiple classes of infected cells expressing different subsets of viral genes. 2) Figure 1B shows a strong relationship between the average expression level and the frequency of detection, most easily explained by poor capture efficiency or another technical artifact resulting in undersampling. 3) These results fail to recapitulate known kinetic classes or uniform LANA expression. 4) Figure S3 indicates that even among host genes, the median cell had only a ~1,000 genes per cell detected, likely an insignificant fraction of expressed genes detected to assess viral gene number. These inconsistencies make it difficult to assess whether the observed heterogeneity is a true reflection of the gene expression profiles during infection or a reflection of the inability to detect lowly expressed transcripts by scRNA-seq.

Given the inherent "noisy" nature of scRNA-seq, it is usually hard to quantify how much of a given mRNA expression variability among individual cells is due to technical limitations, and how much is due to biological differences. The authors could settle this question for at least a small amount of genes, by comparing the variability they see in scRNAseq to that they measure in PrimeFlow and CyTOF (although the latter has the added complication of comparing RNA to protein, but would still be valuable to discuss). If they compare the heterogeneity observed for the given proteins in CyTOF with what they observe for the corresponding transcripts in scRNAseq they will both validate their finding and will be able to estimate how much of their variability in scRNAseq translates to the protein level. They can do the same with their FlowPrime data, which would be even more informative as both measure transcripts. These approaches would be ideal as the data should be readily available. Alternatively, some of the expression should be correlated by RT-qPCR or by Northern blot or if single cell is necessary, then by in situ hybridization.

The fact that the data do not pick up the established signatures of early vs. late gene expression goes against the bulk of work on viral gene expression control. More discussion about why this may be, including limitations of scRNAseq for less abundant transcripts is warranted.

In figure 3A, the authors observe and note both pH2AX+, vRCA- and pH2AX-, vRCA+ cell populations; based on ORF18 or Actin expression, a significant fraction of these cells are infected. The proportion of cells in each gate is not quantified, but it appears that these single-positive cells represent a significant fraction of the total infected cells. However, in Figure 1C their appears to be no major single-positive populations, and the authors note that vRCA and pH2AX levels are highly correlated. This suggests that the cells are missing from the CyTOF analysis (perhaps lying outside of the two gates presented in figure S1A). These missing cells undercuts the value of the dataset and analysis and may lead to incorrect interpretations of pH2AX's value as a marker. Addressing this discrepancy in the FlowPrime/CyTOF data and some form of validation of scRNAseq (either by leveraging their protein data or via independent experiments) will be important for establishing the datasets as a reliable resource.

Two related issues in the text: Line 217. "demonstrate that pH2AX+ and vRCA+ show progressive infection.." Progression implies that the study occurs over different time points, but the time parameter is not measured in these studies. It is not clear to me that these different phenotypes relate to different temporal stages of the infection or if they are different terminal outcomes. The authors should use another term than "progressive" in this context. Line 423 - the use of the work "progression" implies temporal studies which were not performed in this work. The study is a snapshot of a single time point and "progression" is inferred.

Phenotype variation may be due to variation in cell cycle stage, cell viability and age, and asynchronous infection. To what extent are these variables controlled or considered in the analysis?
In Figure 3, the authors show that ~20% of mock-infected cells are negative for beta-actin RNA. This seems quite odd for a house keeping gene, and the corresponding PrimeFlow data is not shown. I assume that this has to do with the authors gating strategy, or some technical issue with PrimeFlow that prevents all RNA molecules from being labeled. In either case, it would be helpful if the authors clarify this point and include the data for the mock cells in the figure.
Could the authors explain their rational for including the CycKO mutant in the analysis and in combining the wt and KO data into one analysis? A-priori, if the mutant has no effect on the current question (de novo infection of fibroblasts) I would suggest excluding it from the paper and only showing the wt data, or to present the data for the mutant in a supplementary file, stating similar results were obtained with it. Although the authors states that only five genes were differently expressed between the wt and mutant, it seems wrong to aggregate the data from the different viruses into a single analysis.
In Figure 6 and the accompanying text, the authors make a distinction between "virus-biased" and "host-biased" cells, based on the % of viral genes expressed in each cell. They go on to claim that "no significant difference in host gene expression among expressed genes" was found between these two groups. The statistical analysis for this result seems to be an ANOVA test, which I believe is not appropriate for this analysis. As the authors are comparing two distribution, something like a Kolmogorov-Smirnov test is needed. Additionally, in the text (line 314), the authors claim that no substantial difference is seen for cell-cycle genes between "virus-biased" and "host-biased" cell (Figure S6A). Looking at the data, it seems to me that G2 cells are highly enriched in the "host-biased" group. A formal quantitative analysis is needed to make this point.
In line 316 the authors state that "host-biased cells expressed a number of interferon-response genes (Figure S6 and Table S3), suggesting a potential role in resistance to infection". I think this claim is not fully supported by the data. Since single-cell RNA-sequencing is a "zero sum" technique, cells with a higher proportion of viral gene expression are bound to show less host genes (as the authors have shown in Figure 6), including ISGs. To show that these cells are indeed expressing more ISGs than the "virus-biased" cells, would require sorting the different populations, as well as mock-infected cells, and measure ISGs (by methods such as qPCR, RNAseq, PrimeFlow, WB etc.), or at least have some analysis that takes into account the increased drop-off of host genes in cells with high levels of viral genes (something like a permutation test?

Read the original source

Redefining De Novo Gammaherpesvirus Infection Through High-Dimensional, Single-Cell Analysis of Virus and Host

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

HIGHLIGHTS

Article activity feed