Host-associated microbe PCR (hamPCR): accessing new biology through convenient measurement of both microbial load and community composition

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    Overall, we agree that this new method is potentially impactful although the full versatility of the approach is currently unclear for several reasons. We appreciated the application of the approach to distinct systems and also the relatively low cost of this technique. The diagrams presented (particularly in Figure 3) nicely convey the steps in the protocol with expected sample outcomes to further facilitate the ability of other researchers to employ hamPCR. Overall, we are very positive about this work, but given that the impact of this paper rests on whether or not the technique is widely adopted, some revisions will lower the barrier to entry for future researchers to adopt this approach.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The ratio of microbial population size relative to the amount of host tissue, or “microbial load”, is a fundamental metric of colonization and infection, but it cannot be directly deduced from microbial amplicon data such as 16S rRNA gene counts. Because conventional methods to determine load, such as serial dilution plating or quantitative PCR, add substantial experimental burden, they are only rarely paired with amplicon sequencing. Alternatively, whole metagenome sequencing of DNA contributed by host and microbes both reveals microbial community composition and enables determination of microbial load, but host DNA typically greatly outweighs microbial DNA, severely limiting the cost-effectiveness and scalability of this approach. We introduce host-associated microbe PCR (hamPCR), a robust amplicon sequencing strategy to quantify microbial load and describe interkingdom microbial community composition in a single, cost-effective library. We demonstrate its accuracy and flexibility across multiple host and microbe systems, including nematodes and major crops. We further present a technique that can be used, prior to sequencing, to optimize the host representation in a batch of libraries without loss of information. Because of its simplicity, and the fact that it provides an experimental solution to the well-known statistical challenges provided by compositional data, hamPCR will become a transformative approach throughout culture-independent microbiology.

Article activity feed

  1. Author Response:

    Reviewer #1 (Public Review):

    This work described a novel approach, host-associated microbe PCR (hamPCR), to both quantify microbial load compared to the host and describe interkingdom microbial community composition with the same amplicon library preparation. The authors used the host single (low-copy) genes as PCR targets to set the host reference for microbial amplicons. To handle the problem that in many cases, the host DNA is excessive compared to the microbiome DNA, the authors adjusted the host-to-microbe amplicon ratio before sequencing. To prove the concept, hamPCR was tested with the synthetic communities, was compared to the shotgun metagenomics results, was applied in the biological systems involving the interkingdom microbial communities (oomycetes and bacteria), or diverse hosts, or crop hosts with large genomes. Substantial data from diverse biological systems confirmed the hamPCR approach is accurate, versatile, easy-to-setup, low-in-cost, improving the sample capacity and revealing the invisible phenomena using regular microbial amplicon sequencing approaches.

    Since the amplification of host genes would be the key step for this hamPCR approach, the authors might also include more strategy discussions about the selection of single (low copy) genes for a specific host and the primer design for the host genes to guarantee the hamPCR usage in the biological systems other than those mentioned in the manuscript.

    A deeper discussion about the design of suitable host primers has been added to the Supplementary Information as Supplementary Discussion 3, and is now mentioned in the main text in the first section of the Methods.

    Reviewer #2 (Public Review):

    Lundberg and colleagues provide a detailed set of data showing the utility of host-associated microbe PCR. By simultaneously amplifying microbial community and host DNA, hamPCR provides an opportunity to measure the microbial load of a sample. I was largely convinced about the robustness of this approach after seeing the many different optimization datasets that were presented in the paper. I also appreciated the various applications of hamPCR that were demonstrated and compared to other standard approaches (CFU counting and shotgun metagenomics, for example). As clearly illustrated in Figure 6f, hamPCR could dramatically improve our understanding of interactions within microbiomes as it helps remove issues of relative abundance data.

    One challenge about the approach presented is that it cannot be quickly adapted to a new system. Unlike most primers for 'standard' microbial amplicon sequencing, considerable time will be required to determine which host gene to target, how to make that host gene size larger than the size of the microbial amplicon, etc. This may limit wide adoption of hamPCR in the field. I do appreciate the authors providing some details in the Supplement on how they developed hamPCR for the several different systems described in this paper. The helpful tips may make it easier for others to develop hamPCR for their own systems.

    Additional strategy of primer design was addressed in the response to Reviewer #1 Public Review.

    An issue that repeatedly came up is that at high and low ends of host:microbe ratios, inaccurate estimates can occur. For example, with high levels of microbial infection, the authors note that hamPCR has reduced accuracy. The authors propose three solutions to this problem (1. altering host:microbe amplicon ratio, 2. use a host gene with higher copy number, 3. and adjust concentrations of host primers), but only present data for #1 and 3. Do they have any data to show that #2 would actually work?

    One instance of potential unreliable load that sticks out in the paper is in Figure 5b. The authors note that this is likely due to unreliable load calculation. Is this just one of 4 replicates? What are other potential reasons this would be an outlier and how can the authors rule this out? Did they repeat the hamPCR for this outlier to confirm the striking difference from the other three samples in the eds1-1 Hpa + Pto sample?

    Both qPCR and amplicon sequencing can be used to detect copy number variation in genomes [1]. Because amplicon-based methods are known to be sensitive to small differences in gene copy number, we are confident, without generating additional data on the topic, that #2 would work.

    Furthermore, bacterial genomes from different taxa are known to vary slightly in their copy number of 16S rDNA, usually from between 1 to about 15 copies [2]. These variations are reflected in sequence counts from amplicon sequencing, biasing the counts towards taxa with more 16S rDNA gene copies [2, 3, 4]. This phenomenon has been well documented, distorts the accurate description of microbial communities, and therefore has led to some efforts to correct 16S rDNA gene amplicon data by dividing the counts from each taxon by the (estimated) 16S rDNA copy number of that taxon, so that the counts better reflect the numbers of bacterial cells.

    Because amplicon methods are sensitive to copy number variation (whether those copies are from inside the same cell, or coming from different cells), we reasoned that choosing a host gene with a higher copy number, similar to the effects of copy number variation on 16S rDNA gene counts, will increase the representation of that host amplicon in the final library (because there will be more template host DNA molecules available to amplify). We did not test this explicitly - we think the evidence from literature is strong support on its own. We have added to the paper a statement that now references the Kembel 2012 paper, which we hope adequately supports our claim:

    “Second, a host gene with a higher copy number could be chosen for HM-tagging throughout the entire project, which would increase host representation by a factor of that copy number (Kembel et al., 2012).”

    1. Martins, W.F.S., Subramaniam, K., Steen, K. et al. Detection and quantitation of copy number variation in the voltage-gated sodium channel gene of the mosquito Culex quinquefasciatus . Sci Rep 7, 5821 (2017). https://doi.org/10.1038/s41598-017-06080-8

    2. Kembel, S. W., Wu, M., Eisen, J. A., & Green, J. L. (2012). Incorporating 16S gene copy number information improves estimates of microbial diversity and abundance. PLoS Computational Biology, 8(10), e1002743. https://doi.org/10.1371/journal.pcbi.1002743

    3. Starke, R., Pylro, V. S., & Morais, D. K. (2021). 16S rRNA Gene Copy Number Normalization Does Not Provide More Reliable Conclusions in Metataxonomic Surveys. Microbial Ecology, 81(2), 535–539. https://doi.org/10.1007/s00248-020-01586-7

    4. Louca, S., Doebeli, M., & Parfrey, L. W. (2018). Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem. Microbiome, 6(1), 41. https://doi.org/10.1186/s40168-018-0420-9

    Could the DNA extraction method used cause biases in hamPCR for/against either the host or the microbiome? If two different labs study the same system (let's say bacterial communities growing on Arabidopsis leaves) but use different DNA extraction approaches, would we expect them to obtain different answers using hamPCR? Did the authors try several different DNA extraction methods to see if this is an issue? Or has another team of researchers considered this and addressed it in a separate paper? I would appreciate seeing either data to address this or a discussion paragraph that reasons through this.

    Differences in DNA extraction method will certainly change the results, not only of the microbe-to-plant ratio, but also in the representation of microbes, because microbes differ in their sensitivity to different lysis methods. This is a well-documented concern in microbiome studies and has been demonstrated by using different methods on the same mock community in papers such as the following:

    Yuan, S., Cohen, D. B., Ravel, J., Abdo, Z., & Forney, L. J. (2012). Evaluation of methods for the extraction and purification of DNA from the human microbiome. PloS One, 7(3), e33865. https://doi.org/10.1371/journal.pone.0033865

    Albertsen, M., Karst, S. M., Ziegler, A. S., Kirkegaard, R. H., & Nielsen, P. H. (2015). Back to Basics--The Influence of DNA Extraction and Primer Choice on Phylogenetic Analysis of Activated Sludge Communities. PloS One, 10(7), e0132783. https://doi.org/10.1371/journal.pone.0132783

    In short, if the DNA is not extracted because plant or microbial cells are not lysed, it cannot be amplified in PCR. However, there is a good overall strategy to minimize the problem, as also proposed in the above papers, and that is to err on the side of a harsher lysis (using strong bead beating, as we have done), since this will leave fewer cells unlysed (and thus less information will be hidden). We note that similar concerns about lysis methods changing results also apply to DNA extraction for qPCR and live bacterial isolation for CFU counting (for which too harsh a lysis will kill bacteria, but too gentle a lysis will not release them from host tissue).

    We addressed this in two places. First, in the results section we mention briefly the following:

    “All DNA preps employed heavy bead beating to ensure thorough lysis of both host and microbes, as an incomplete DNA extraction can lead to underrepresentation of hard-to-lyse cells (Albertsen et al., 2015; Yuan et al., 2012).”

    Second, we added a paragraph to the discussion about sample selection and DNA extraction as follows:

    “Because hamPCR can only quantify the DNA available in the template, choice of sample and appropriate DNA extraction methods are very important. In particular, the sample must in the first place include a meaningful quantity of host DNA. For example, although there is some host DNA in mammalian fecal samples or in plant rhizosphere soil samples, this host DNA does not accurately represent the sample volume, and therefore relating microbial abundance to host abundance probably has less value in these cases. Further, the DNA extraction method chosen must lyse both the host and microbial cell types. An enzymatic lysis suitable for DNA extraction from pure cultures of E. coli may not lyse host cells or even other microbes. Appropriate DNA preparation methods for metagenomics have been thoroughly evaluated elsewhere (Albertsen et al., 2015; Yuan et al., 2012), and a common point of agreement is that strong bead-beating increases the yield and completeness of the DNA extraction, but comes at the cost of some DNA fragmentation. Especially for short reads, as we have used here, this fragmentation is not a problem, and we recommend to err on the side of a harsher lysis, using strong bead beating potentially preceded by grinding steps using a mortar and pestle as necessary for tougher tissue.”

    One emerging theme in microbiome science is to have consistent methodologies that are used across studies/labs to allow direct comparisons of microbiome datasets. Standardization of approaches may make microbiome science more robust in the long-term. Given much of the nuance in developing hamPCR for different systems, my impression is that this method is best for comparing samples within a particular host-microbe system and not across systems. For example, it may be challenging to directly compare my bacterial load hamPCR data from Arabidopsis to another lab's if we used different Arabidopsis host genes or if we used different 16S gene regions. Can the authors unpack this a bit in a discussion paragraph? If it is widely adopted, is there a way to standardized hamPCR so that it can be consistently used and compared across datasets? Or should that not be the goal?

    There appears to be considerable non-specific amplification or dimers in the gels presented throughout the manuscript. Could this non-specific amplification vary across host-microbe primer combinations? Would this impact quantification of host and microbial amplicons?

    Non-specific amplification / dimers do vary across host-microbe primer combinations. Indeed, they also vary between common 16S rRNA primer pairs used on their own (not shown). Fortunately non-specific amplicons amplified during the exponential PCR step do not, at least with our method, seem to impact quantification of host and microbial amplicons.

    One reason is that non-specific amplicons can be recognized by their sequence and ignored. After the sequences of the amplicons have been extracted from the short read data, only those that match expected length and sequence patterns of the targeted amplicons need to be counted. Non-specific amplicons are certainly a nuisance because they represent wasted sequencing resources, but they can be excluded bioinformatically and therefore do not change the accuracy of the microbial load measurement. This is in contrast to ddPCR/qPCR, for which any off-target amplicons are also quantified!

    A second reason is that the sensitive exponential amplicon step of hamPCR is done with a single primer pair. Off-target sequences do squander PCR reagents including primers and dNTPs, such that they become limiting at earlier cycles than without off-target sequences, but because the exponential PCR step is done with a single primer pair, such inferior amplification conditions are shared by all molecules, and therefore do not differentially affect the host or microbial amplicon. Any off-target binding occurring in the initial tagging reaction (before the PCR step) would certainly be a concern if the reaction was carried on long enough, because for example the microbial primer pair might become limiting at an earlier cycle number, leading to underestimates of microbial load. However, limiting the tagging cycle to a low number of cycles ensures that – should primers targeting a particular host or microbial amplicon be non-specific – the fraction still available to bind the correct sequence remains in excess.

  2. Reviewer #3 (Public Review):

    Strengths: It is clear through this manuscript that the authors intend for this to be a useful approach for as many fields as possible. While previous technical approaches to maximize the capture of members of microbiomes fail to translate to other environments or hosts, the authors demonstrate the utility of hamPCR by testing it in a number of other systems. The diagrams presented (particularly in Figure 3) nicely convey the steps in the protocol with expected sample outcomes to further facilitate the ability of other researchers to employ hamPCR.

    Weaknesses: The challenge of demonstrating the widespread utility in other systems is creating and maintaining biologically-driven narrative. While this is not necessary if the goal is to simply show that a techniques works, it does help to highlight the importance of implementing a new method and increase the likelihood that it will be adopted by other researchers.

  3. Reviewer #2 (Public Review):

    Lundberg and colleagues provide a detailed set of data showing the utility of host-associated microbe PCR. By simultaneously amplifying microbial community and host DNA, hamPCR provides an opportunity to measure the microbial load of a sample. I was largely convinced about the robustness of this approach after seeing the many different optimization datasets that were presented in the paper. I also appreciated the various applications of hamPCR that were demonstrated and compared to other standard approaches (CFU counting and shotgun metagenomics, for example). As clearly illustrated in Figure 6f, hamPCR could dramatically improve our understanding of interactions within microbiomes as it helps remove issues of relative abundance data.

    One challenge about the approach presented is that it cannot be quickly adapted to a new system. Unlike most primers for 'standard' microbial amplicon sequencing, considerable time will be required to determine which host gene to target, how to make that host gene size larger than the size of the microbial amplicon, etc. This may limit wide adoption of hamPCR in the field. I do appreciate the authors providing some details in the Supplement on how they developed hamPCR for the several different systems described in this paper. The helpful tips may make it easier for others to develop hamPCR for their own systems.

    An issue that repeatedly came up is that at high and low ends of host:microbe ratios, inaccurate estimates can occur. For example, with high levels of microbial infection, the authors note that hamPCR has reduced accuracy. The authors propose three solutions to this problem (1. altering host:microbe amplicon ratio, 2. use a host gene with higher copy number, 3. and adjust concentrations of host primers), but only present data for #1 and 3. Do they have any data to show that #2 would actually work?

    One instance of potential unreliable load that sticks out in the paper is in Figure 5b. The authors note that this is likely due to unreliable load calculation. Is this just one of 4 replicates? What are other potential reasons this would be an outlier and how can the authors rule this out? Did they repeat the hamPCR for this outlier to confirm the striking difference from the other three samples in the eds1-1 Hpa + Pto sample?

    Could the DNA extraction method used cause biases in hamPCR for/against either the host or the microbiome? If two different labs study the same system (let's say bacterial communities growing on Arabidopsis leaves) but use different DNA extraction approaches, would we expect them to obtain different answers using hamPCR? Did the authors try several different DNA extraction methods to see if this is an issue? Or has another team of researchers considered this and addressed it in a separate paper? I would appreciate seeing either data to address this or a discussion paragraph that reasons through this.

    One emerging theme in microbiome science is to have consistent methodologies that are used across studies/labs to allow direct comparisons of microbiome datasets. Standardization of approaches may make microbiome science more robust in the long-term. Given much of the nuance in developing hamPCR for different systems, my impression is that this method is best for comparing samples within a particular host-microbe system and not across systems. For example, it may be challenging to directly compare my bacterial load hamPCR data from Arabidopsis to another lab's if we used different Arabidopsis host genes or if we used different 16S gene regions. Can the authors unpack this a bit in a discussion paragraph? If it is widely adopted, is there a way to standardized hamPCR so that it can be consistently used and compared across datasets? Or should that not be the goal?

    There appears to be considerable non-specific amplification or dimers in the gels presented throughout the manuscript. Could this non-specific amplification vary across host-microbe primer combinations? Would this impact quantification of host and microbial amplicons?

  4. Reviewer #1 (Public Review):

    This work described a novel approach, host-associated microbe PCR (hamPCR), to both quantify microbial load compared to the host and describe interkingdom microbial community composition with the same amplicon library preparation. The authors used the host single (low-copy) genes as PCR targets to set the host reference for microbial amplicons. To handle the problem that in many cases, the host DNA is excessive compared to the microbiome DNA, the authors adjusted the host-to-microbe amplicon ratio before sequencing. To prove the concept, hamPCR was tested with the synthetic communities, was compared to the shotgun metagenomics results, was applied in the biological systems involving the interkingdom microbial communities (oomycetes and bacteria), or diverse hosts, or crop hosts with large genomes. Substantial data from diverse biological systems confirmed the hamPCR approach is accurate, versatile, easy-to-setup, low-in-cost, improving the sample capacity and revealing the invisible phenomena using regular microbial amplicon sequencing approaches.

    Since the amplification of host genes would be the key step for this hamPCR approach, the authors might also include more strategy discussions about the selection of single (low copy) genes for a specific host and the primer design for the host genes to guarantee the hamPCR usage in the biological systems other than those mentioned in the manuscript.

  5. Evaluation Summary:

    Overall, we agree that this new method is potentially impactful although the full versatility of the approach is currently unclear for several reasons. We appreciated the application of the approach to distinct systems and also the relatively low cost of this technique. The diagrams presented (particularly in Figure 3) nicely convey the steps in the protocol with expected sample outcomes to further facilitate the ability of other researchers to employ hamPCR. Overall, we are very positive about this work, but given that the impact of this paper rests on whether or not the technique is widely adopted, some revisions will lower the barrier to entry for future researchers to adopt this approach.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)