VSGs expressed during natural T. b. gambiense infection exhibit extensive sequence divergence and a subspecies-specific expression bias

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Trypanosoma brucei gambiense is the primary causative agent of human African trypanosomiasis (HAT), a vector-borne disease endemic to West and Central Africa. The extracellular parasite evades antibody recognition within the host bloodstream by altering its Variant Surface Glycoprotein (VSG) coat through a process of antigenic variation. The serological tests which are widely used to screen for HAT use VSG as one of the target antigens. However, the VSGs expressed during human infection have not been characterized. Here we use VSG-seq to analyze the VSGs expressed in the blood of patients infected with T. b. gambiense and compared them to VSG expression in T. b. rhodesiense infections in humans as well as T. b. brucei infections in mice. The 44 VSGs expressed during T. b. gambiense infection revealed a striking bias towards expression of type B N-termini (82% of detected VSGs). This bias is specific to T. b. gambiense , which is unique among T. brucei subspecies in its chronic clinical presentation and anthroponotic nature, pointing towards a potential link between VSG expression and pathogenesis. The expressed T. b. gambiense VSGs also share very little similarity to sequences from 36 T. b. gambiense whole genome sequencing datasets, particularly in areas of the VSG protein exposed to host antibodies, suggesting that wild T. brucei VSG repertoires vary more than previously expected. Overall, this work demonstrates new features of antigenic variation in T. brucei gambiense and highlights the importance of understanding VSG repertoires in nature.

Significance Statement

Human African Trypanosomiasis is a neglected tropical disease primarily caused by the extracellular parasite Trypanosoma brucei gambiense . To avoid elimination by the host, these parasites repeatedly replace their Variant Surface Glycoprotein (VSG) coat. Despite the important role of VSGs in prolonging infection, VSG expression during human infections is poorly understood. A better understanding of natural VSG gene expression dynamics can clarify the mechanisms that T. brucei uses to alter its VSG coat and improve trypanosomiasis diagnosis in humans. We analyzed the expressed VSGs detected in the blood of patients with trypanosomiasis. Our findings indicate that there are features of antigenic variation unique to human-infective T. brucei subspecies and VSGs expressed in natural infection may vary more than previously expected.

Article activity feed

  1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Reply to the reviewers

    Reviewer #1:

    This is the first such piece of data to come from human infective parasites in the field. Technically this is a feat - because the small number of parasites that are present per mL of human blood at any given time during infection with T gambiense. Nevertheless they manage to identify up to 14 unique VSGs per patient sample. And this raises the first theoretical question: can they extrapolate to the average diversity load per human?

    This is an intriguing question that we would like to eventually answer, but we do not believe we can make this estimate from the data we currently have. We know our sampling is insufficient based on the correlation between parasitemia and diversity, and we do not have sufficiently precise estimates of parasitemia that could be used to extrapolate total diversity in the blood. Moreover, our analysis was only performed on RNA extracted from whole blood samples. Recent studies indicate that significant populations of parasites reside in extravascular tissue spaces, and our analysis did not address antigenic diversity in these spaces. We believe it is unlikely that the blood alone reflects the full diversity of VSG expression in an infection, and an estimate based only on blood-resident parasites (if possible) could be misleading.

    this is important because the timing of sample collection (ie that it occurred within a period of weeks) suggesting that an initial group of infected tsetse infected these patients (rather than a small number of interactions between a bloodmeal and a new infection - generally in itself on the order of 1 month or so). If parasitemia is low and diversity limited, this would explain both why CATT works as well as it does (because really it shouldn't at all!) and perhaps even the chronicity of infection (in the sense that the organism is unlikely to "run out" even of complete VSGs, never mind mosaics). The paper would benefit from a direct discussion on this.

    Indeed, the timing of sample collection could inform our interpretation of the data. However, sample collection occurred over a period of six months. More importantly, patients were in both early and late stage disease at the time of sample collection, so we cannot estimate how long any individual patient had been infected. We have added text (line 180) to highlight this fact. Because some patients were infected at least 6 months apart (if not much more than that), it is unlikely that patients were infected around the same time by a small group of infected tsetse flies. Reviewer #1 introduces an interesting point about the efficacy of the CATT diagnostic test as it relates to antigenic diversity. We discuss CATT sensitivity in the introduction (lines 115-120) as well as the discussion, where regional sensitivity differences are mentioned (lines 715-718). Given uncertainty about total diversity and time since initial infection, we have refrained from speculating about how diversity/timing could affect CATT sensitivity.

    An interesting feature of this new study is the apparent bias to type B N-terminal domain VSGs as well as the discovery that two patients share a specific VSG isolate (though it is not mentioned whether they are related by distance etc). This raises the possibility of substrains with different VSG archives that vary by geography.

    We found two VSGs which were expressed in more than one patient. One was expressed in two patients from the same village (village C) while the other VSG was common between two cases originating from villages C and D, some 40 km apart. We agree that our data generally support the possibility that the VSG archive might vary geographically. We have performed additional analyses suggested by reviewer #2 that support this idea: we have now shown that Tbg patient VSGs classified in this study, which originated from the DRC, are distinct from the VSGs encoded by the reference strain Tbg DAL 972 which was isolated in Cote d’Ivoire. We mention this possibility on lines 721-724.

    Alternatively it suggests that perhaps type B VSGs are picked up differentially by serology (and there the one feature of type B VSGs that could be shared, with regards to detection, is the O-hexose decoration on a number of type B VSG surfaces. Could CATT be detecting elements common to sugar decorated VSGs? Experimentally this is something that can be tested even with mouse infection materials.

    This is indeed an intriguing possibility. We mention this in the discussion (lines 772-778): “In T. brucei, several VSGs have evolved specific functions besides antigenic variation [74]. Recently, the first type B VSG structure was solved [75], revealing a unique O-linked carbohydrate in the VSG’s N-terminal domain. This modification was found to interfere with the generation of protective immunity in a mouse model of infection; perhaps structural differences between each VSG type, including patterns of glycosylation, could influence infection outcomes.” While this is an experimentally tractable explanation for the type B VSG bias we observe, we believe such experiments are beyond the scope of the current paper.

    Side comment: are the common VSGs mutated between patient samples?

    We classified VSGs as common between patient samples if they had >98% nucleotide sequence identity as well as meeting the other quality cutoffs such as 1% expression level and consistency across technical replicates. This identity cutoff still allows for several mismatches between sequences, which we do occasionally observe. However, we cannot confidently rule out that the “mutations” we observe are sequencing or PCR errors. Thus, we cannot say for sure if there are mutations between common VSGs.

    Reviewer #2


    1.Throughout the manuscript you observe 'diversity' in expressed VSG and its existence becomes a principal conclusion. I feel that the meaning of diversity and its significance is not sufficiently explained for the reader. In the abstract (l48) you say that there is 'marked diversity' in parasite populations. Presumably you mean parasite infrapopulations, i.e. within patients, not across the DRC? In any case, what is 'marked' about it, and relative to what? Why does it matter that there are multiple expressed VSG in a single patient at one time? Is this not a reasonable expectation for a population of (presumably) clones capable of switching the expressed VSG? How is this different to the view typical of the literature since 1970 that one VSG dominates while others wait in the background at low frequencies. If 'diversity' is the conclusion, then you need to define it and explain its significance more.

    When we refer to diversity, we do mean infrapopulations of parasites within patients, or individual animals in this case, rather than across the DRC. We have edited the text to make this clear (see below). However, the study which benchmarked the application of VSG-seq to quantify VSG expression *in vivo *during mouse did not support the previously-held view that one VSG dominates while others wait in the background at low frequencies. Frequently we observe a handful of VSGs present at 10-20% of the population at any timepoint, and many VSGs (~50% of all detected variants) present at “In a proof-of-principle study, we used VSG-seq to gain insight into the number and diversity of VSGs expressed during experimental mouse infections [30]. This proof-of-principle study revealed significant *VSG *diversity within parasite populations in each animal, with many more variants expressed at a time than the few thought to be sufficient for immune evasion. This diversity suggested that the parasite’s genomic VSG repertoire might be insufficient to sustain a chronic infection, highlighting the potential importance of recombination mechanisms that form new VSGs.

    2.Following on from 1., why does the analysis deal in counts of distinct VSG or N-terminal domains, and not then progress to their relative expression? The expression data are in Supp Table 3 and they show that, in most cases where many VSG are observed in the same patient, 1-3 of these are 'dominant', i.e. they account for >50% of the population.

    The VSG-seq analysis pipeline does estimate the relative expression level of each identified variant in the population, and this information is available in the supplemental data (Supplemental Figure 1, Supplemental Table 3). However, we chose not to rely on these measurements too heavily because there was some variation between Tbg technical replicates, which is shown in the supplemental heatmap (Supplemental Figure 1). Replicate three tends to not agree with the first two replicates. We suspect that this was due to the order of sample processing and the fact that the parasite-enriched cDNA sample was repeatedly freeze-thawed between library preparations for technical replicates. Additionally, because our sampling did not reach saturation, some VSGs are not detected in all replicate libraries, making it difficult to estimate their abundance.

    We have added a discussion of these issues to the text on lines 431-433: “Because our sampling did not reach saturation, resulting in some variability between technical replicates, we chose to focus only on the presence/absence of individual VSGs rather than expression levels within parasite populations.”

    Figure 1 deals in VSG counts, but I would then expect another figure to illustrate the reality that only a minority of these observed VSG are likely to be clinically relevant (i.e. the subject of the immune response). This impacts the 'diversity' conclusion, as given in the discussion (ll 657-9), because you cannot afford to treat all these VSG equally when their abundances are quite different.


    We agree that relative expression level is a useful metric, but absent longitudinal sampling it is impossible to determine which VSGs are clinically relevant as defined by the reviewer: low abundance VSGs at one time point may be the predominantly expressed variant at another. Moreover, the threshold for triggering an anti-VSG antibody response remains unknown. Thus, we have chosen to treat all detected variants equally.

    3.How related are these VSG? Were you able to ensure unique read mapping to the VSG assembly? Can you show that reads mapped to a single VSG only and therefore, that the RPKM values are reliable?

    Our analysis accounts for the fact that VSGs can be very similar. We only considered uniquely mapping reads in our VSG-seq analysis. We also account for mappability in our quantification, so VSG sequences that are less unique (and thus have fewer uniquely mapping reads) are not artificially underrepresented in estimates of relative expression. We have specified the parameters used for alignment (line 274) in the methods.

    4.The authors observed no orthology between expressed VSG and DAL972 genes. This is really interesting and deserves closer attention. Presumably there is microhomology? For T. brucei VSG, with constant recombination, we would predict that a comparison of the VSG in West and Central Africa would reveal a pattern of mosaicism, such that individual sequences in DRC would break down into motifs present in multiple genes in the West African reference. Question is, how many genes? What does that distribution look like? What is the smallest homology tract? There is an opportunity here to comment on how VSG repertoires diverge under recombination. How much of the expressed VSG sequence is truly unrepresented in the West African reference (or other T.b.gambiense genome sequences available in ENA). I can believe that none of the N-terminal domains in these data are present intact in DAL972, but I cannot believe that their components are not present without evidence.

    We appreciate the reviewer’s suggestion to look at this more closely. We have performed additional analyses to address sequence similarity, or lack thereof, between the assembled DRC patient VSG and the West African reference TbgDAL972. We ran a nucleotide BLAST of expressed VSGs against the TbgDAL972 genome reference sequence pulled from TriTrypDB.org (release 54). We have added a supplemental figure depicting the results of this analysis (Supplemental Figures 6 and 7). Briefly, our analysis shows that most of the N-termini we identified have no significant similarity to DAL972 VSGs, even with very permissive search parameters. There are frequent hits in the VSG C-termini, however, which might be expected. Most BLAST hits are short spans 98% identity are short 20-25 bp regions. Given the large divergence from the reference, we were unable to infer any patterns of recombination in the VSGs. However, we believe this analysis supports our claim that the N-termini of VSGs assembled from DRC patients are novel, with their component parts largely unrepresented in the West African reference genome.

    Figure 4 compares NTD type composition in the DRC data with previously published mouse experiments. The latter take place over very short timescales in maladapted hosts, while the timescales of the latter in natural hosts are unknown but plausibly very much longer. So are these data really comparable and are we learning anything from their comparison, given that the most likely explanation for the NTD bias in expressed VSG is the underlying genomic composition?


    Indeed, this is our intended conclusion from figure 4. The figure is meant to illustrate our claim that the expressed VSGs in each experimental set reflect the underlying genomic composition of their corresponding reference strains, despite fluctuations over time. The language and legend for Figure 4 has been clarified to emphasize this point. We have emphasized in the text that it is unknown whether these fluctuations occur over time in much longer natural infections.


    6.Please comment on the technical reproducibility of the data, there are multiple instances in Supp Table 3 where technical replicates expressed different VSG.

    Three RNA-seq library technical replicates were prepared for each individual gHAT patient RNA sample. Replicates were prepared in batches together so all 1’s were done on the same day, for example. The original parasite-enriched cDNA sample was frozen and thawed between each batch. We suspect that the cDNA degraded after repeated freeze-thaw cycles, which is why replicate three tends to not agree with the other two as can be seen on the heatmap in supp fig. 1 and the expression data in supp table 3. We also suspect the fact that our sampling did not reach saturation resulted in the detection of different VSGs in individual replicate preps. We have edited the methods and mentioned this variability in the results section to communicate this issue more transparently.

    • Lines 395-397 “Using RNA extracted from 2.5 mL of whole blood from each patient, we prepared libraries for VSG-seq in three separate batches for each technical replicate.”
    • Lines 431-433: “Because our sampling did not reach saturation, resulting in some variability between technical replicates, we chose to only focus on the presence/absence of individual VSGs rather than relative expression levels within the population”

    Reviewer #3

    In line 499, the authors conclude the due to the expressed VSGs being different in the blood and CSF being difference it may indicate that different organs harbor different VSG sets. Given that this is n=1 for patient samples I think this is too speculative a statement. There is also no indication as to whether the samples were taken at the same time or not.

    This is absolutely correct. The precise timing of CSF sample collection is unknown for these samples. It likely occurred within hours to days after blood collection, but even on this short time scale, the unique CSF repertoire could represent the antibody-mediated clearance of one VSG population and replacement with another. We have scaled back our language and only point out that there are unique VSGs in this space (Lines 522 – 524).

    I think that the authors need to be very careful as to the conclusions drawn about VSG expression over time in terms of hierarchy and N-terminal fluctuations. For any conclusions to be drawn on the hierarchy of VSG expression more data points are needed taken over time (this is obviously challenging when looking at patient samples). I find it too speculative to draw any conclusions when single time points are assessed and the assumption on the progression of the infection depends on whether it is a Tb or Tbr.

    Reviewer #2 also pointed this out. We agree and have attempted to limit definitive conclusions in the text and instead discuss multiple possible explanations behind our observations.

    I found some of the figure legends a bit terse. For example, in Figure 1 C, what do the black circles and lines represent? Perhaps a little more detail would help the reader.

    Clarified legends for UpSet plots in figures 1C and 3C as follows: “The intersection of expressed VSG sets in each patient. Bars on the left represent the size of the total set of VSGs expressed in each patient. Dots represent an intersection of sets with bars above the dots representing the size of the intersection.”

    In figure 2, I found it difficult to distinguish between the orange and dark red in (A) and the two lighter blue colors.

    We have changed N-terminal type color palette for all plots to make red and blue hues more distinctive.

    In line 389 – estimate

    Corrected

    In line 498 - should be reference been to figure 2C?

    This should be a reference to Figure 3B. We have corrected the reference.

  2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #3

    Evidence, reproducibility and clarity

    Summary:

    In this work, So and Sudlow et al have used an established methodology - VSG-seq to assess the expressed VSG diversity in 12 patients infected with T. brucei gambiense. As with what is seen in mouse models, there is a diversity in VSG expression seen in patients. The application of this technology has not previously been used on patient samples and is now validated as a valuable tool to study antigenic variation in human populations. The authors have found that in addition to the VSG diversity seen there was a significant bais towards B type N-terminal domains and a restricted C-terminal types. This work, although on a small sample group, is an important step forward to applying this technology to understanding trypanosome immune evasion in the field.

    Major comments:

    I think that overall, the key conclusions on the expressed VSG diversity and that there are geographical variations are convincing and would agree with the conclusions that it is now feasible to study antigenic variation in the field. But given the sample size the I feel that two of the findings are overstated and should at least be qualified as speculative.

    1.In line 499, the authors conclude the due to the expressed VSGs being different in the blood and CSF being difference it may indicate that different organs harbor different VSG sets. Given that this is n=1 for patient samples I think this is too speculative a statement. There is also no indication as to whether the samples were taken at the same time or not.

    2.I think that the authors need to be very careful as to the conclusions drawn about VSG expression over time in terms of hierarchy and N-terminal fluctuations. For any conclusions to be drawn on the hierarchy of VSG expression more data points are needed taken over time (this is obviously challenging when looking at patient samples). I find it too speculative to draw any conclusions when single time points are assessed and the assumption on the progression of the infection depends on whether it is a Tb or Tbr. I don't believe that any other experiments are needed and the statistical analysis is adequate.

    Minor comments:

    I found some of the figure legends a bit terse. For example, in Figure 1 C, what do the black circles and lines represent? Perhaps a little more detail would help the reader.

    In figure 2, I found it difficult to distinguish between the orange and dark red in (A) and the two lighter blue colors.

    In line 389 - estimate

    In line 498 - should be reference been to figure 2C?

    Significance

    Overall, this is an interesting study and shows the practical application of VSG-seq on the study of human infections. There is clearly interesting biology to be learned about both Tbg and Tbr infections and immune evasion by these parasites - which can now be done with the development and application of these technologies. I am a molecular cell biologist who specialises in trypanosome biology.

  3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #2

    Evidence, reproducibility and clarity

    So et al. have analyzed the expression profiles of T.b.gambiense VSG genes in 12 natural human infections in DRC during a six month period of 2013, and compared these results to existing data for T.b.rhodesiense VSG and previously published data from mice. They use the VSGseq approach developed by the Mugnier lab over the last few years to good effect and provide a description of the expression profiles using phylogenetic and network approaches. The main conclusions are that parasite infrapopulations in each patient expression largely mutually exclusive VSG cohorts, with a couple of exceptions where patients 'shared' identical VSG transcripts. The authors note that these congolese VSG are not comparable with the West African T.b.gambiense reference sequence, and there is a pronounced bias in the systematic composition of expressed VSG (towards 'B-type VSG') that is not observed in other T. brucei subspecies. These latter observations lead to the suggestion that there may be substantial variation in expressed VSG repertoire among T. brucei populations that could have important consequences for pathology, although the spatial or temporal scale upon which this variation could be expected to occur cannot be inferred from these data. Overall, a competent study and a welcome addition to, if not extension of, recent work describing the dynamics of VSG expression in multiple African trypanosomes.

    Major points:

    1.Throughout the manuscript you observe 'diversity' in expressed VSG and its existence becomes a principal conclusion. I feel that the meaning of diversity and its significance is not sufficiently explained for the reader. In the abstract (l48) you say that there is 'marked diversity' in parasite populations. Presumably you mean parasite infrapopulations, i.e. within patients, not across the DRC? In any case, what is 'marked' about it, and relative to what? Why does it matter that there are multiple expressed VSG in a single patient at one time? Is this not a reasonable expectation for a population of (presumably) clones capable of switching the expressed VSG? How is this different to the view typical of the literature since 1970 that one VSG dominates while others wait in the background at low frequencies. If 'diversity' is the conclusion, then you need to define it and explain its significance more.

    2.Following on from 1., why does the analysis deal in counts of distinct VSG or N-terminal domains, and not then progress to their relative expression? The expression data are in Supp Table 3 and they show that, in most cases where many VSG are observed in the same patient, 1-3 of these are 'dominant', i.e. they account for >50% of the population. Figure 1 deals in VSG counts, but I would then expect another figure to illustrate the reality that only a minority of these observed VSG are likely to be clinically relevant (i.e. the subject of the immune response). This impacts the 'diversity' conclusion, as given in the discussion (ll 657-9), because you cannot afford to treat all these VSG equally when their abundances are quite different.

    3.How related are these VSG? Were you able to ensure unique read mapping to the VSG assembly? Can you show that reads mapped to a single VSG only and therefore, that the RPKM values are reliable?

    4.The authors observed no orthology between expressed VSG and DAL972 genes. This is really interesting and deserves closer attention. Presumably there is microhomology? For T. brucei VSG, with constant recombination, we would predict that a comparison of the VSG in West and Central Africa would reveal a pattern of mosaicism, such that individual sequences in DRC would break down into motifs present in multiple genes in the West African reference. Question is, how many genes? What does that distribution look like? What is the smallest homology tract? There is an opportunity here to comment on how VSG repertoires diverge under recombination. How much of the expressed VSG sequence is truly unrepresented in the West African reference (or other T.b.gambiense genome sequences available in ENA). I can believe that none of the N-terminal domains in these data are present intact in DAL972, but I cannot believe that their components are not present without evidence.

    5.Figure 4 compares NTD type composition in the DRC data with previously published mouse experiments. The latter take place over very short timescales in maladapted hosts, while the timescales of the latter in natural hosts are unknown but plausibly very much longer. So are these data really comparable and are we learning anything from their comparison, given that the most likely explanation for the NTD bias in expressed VSG is the underlying genomic composition?

    6.Please comment on the technical reproducibility of the data, there are multiple instances in Supp Table 3 where technical replicates expressed different VSG.

    Minor points:

    1. Type 'estimates' line 389

    Significance

    The significance of this work relates to the application of VSG expression profiling to natural human infections, something not previously done largely because human infections are rare and materials difficult to obtain. The approach and the conclusions are not novel and do not represent substantial advances on previous efforts, but have an important aspect in confirming for natural infections what has been observed in quite artificial experimental settings. Sample size is small and this means that the conclusions remain speculative and cannot readily be extended to all HAT settings. This is not a criticism, since the analysis of any human samples is progress, but it does mean that the study raises interesting questions (e.g. variation across the population in N-terminal domain usage) rather than providing definitive conclusions. It is likely to interest trypanosome biologists with a specific interest in antigenic variation.

    My own field concerns trypanosome genomics and the evolutionary dynamics of variant antigen genes.

  4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #1

    Evidence, reproducibility and clarity

    In this paper by Mugnier and colleagues describe the repertoire of VSGs present within a cohort of human HAT cases that occurred at relatively close geographical distance.

    VSG repertoires were first described by the senior author a few years ago already, from mouse infection data. This is the first such piece of data to come from human infective parasites in the field. Technically this is a feat - because the small number of parasites that are present per mL of human blood at any given time during infection with T gambiense. Nevertheless they manage to identify up to 14 unique VSGs per patient sample. And this raises the first theoretical question: can they extrapolate to the average diversity load per human? this is important because the timing of sample collection (ie that it occurred within a period of weeks) suggesting that an initial group of infected tsetse infected these patients (rather than a small number of interactions between a bloodmeal and a new infection - generally in itself on the order of 1 month or so). If parasitemia is low and diversity limited, this would explain both why CATT works as well as it does (because really it shouldn't at all!) and perhaps even the chronicity of infection (in the sense that the organism is unlikely to "run out" even of complete VSGs, never mind mosaics). The paper would benefit from a direct discussion on this.

    An interesting feature of this new study is the apparent bias to type B N-terminal domain VSGs as well as the discovery that two patients share a specific VSG isolate (though it is not mentioned whether they are related by distance etc). This raises the possibility of substrains with different VSG archives that vary by geography. Alternatively it suggests that perhaps type B VSGs are picked up differentially by serology (and there the one feature of type B VSGs that could be shared, with regards to detection, is the O-hexose decoration on a number of type B VSG surfaces. Could CATT be detecting elements common to sugar decorated VSGs? Experimentally this is something that can be tested even with mouse infection materials.

    Side comment: are the common VSGs mutated between patient samples?

    Significance

    Significance: high in the sense that this is the first in human field study of a disease that has been studied quite a lot in mouse models. Clearly from this work, there is still a lot to be learned from studying a disease in context.

    Audience: parasitologists

    My own expertise: parasitology and immunology

    Referees cross-commenting

    Nothing substantial to add. From the comments (all of which are worthwhile) I would suspect this would require minor revision.