Efficient and stable metabarcoding sequencing from DNBSEQ-G400 sequencer examined by large fungal community analysis

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Metabarcoding has become the de facto method for characterizing the structure of microbial communities in complex environmental samples. To determine how sequencing platform may influence microbial community characterization, we present a large-scale comparison of two sequencing platforms; Illumina MiSeq and a new platform DNBSEQ-G400 developed by MGI Tech. The accuracy of DNBSEQ-G400 on bacterial and fungal mock samples and compared sequencing consistency and precision between DNBSEQ-G400 and MiSeq platforms by sequencing the fungal ITS2 region from 1144 soil samples with 3 technical replicates. The DNBSEQ-G400 showed a high accuracy in reproducing mock communities containing different proportions of bacteria and fungi, respectively. The taxonomic profiles of the 1144 soil samples generated by the two DNBSEQ-G400 modes closely resembled each other and were highly correlated with those generated by the MiSeq platform. Analyses of technical replicates demonstrated a run bias against certain taxa on the MiSeq but not DNBSEQ-G400 platform. Based on lower cost, greater capacity, and less bias, we conclude that DNBSEQ-G400 is an optimal platform for short-term metabarcoding of microbial communities.

IMPORTANCE

Experimental steps that generate sequencing bias during amplicon sequencing have been intensively evaluated, including the choice of primer pair, polymerase, PCR cycle and technical replication. However, few studies have assessed the accuracy and precision of different sequencing platforms. Here, we compared the performance of newly released DNBSEQ-G400 sequencer with that of the commonly used Illumina MiSeq platform by leveraging amplicon sequencing of a large number of soil samples. Significant sequencing bias among major fungal genera was found in parallel MiSeq runs, which can be easily neglected without the use of sequencing controls. We emphasize the importance of technical controls in large-scale sequencing efforts and provide DNBSEQ-G400 as an alternative with increased sequencing capacity and more stable reproducibility for amplicon sequencing.

Article activity feed

  1. ABSTRACT

    This work has been peer reviewed in GigaByte (https://doi.org/10.46471/gigabyte.16), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    **Reviewer 1. Inge Seim **Is the language of sufficient quality? No. The authors need to polish their English further. This is particularly obvious in the Abstract and is likely to result in an unwarranted lower readership of the work.

    Are all data available and do they match the descriptions in the paper?
    Yes. I want to commend the authors for sharing data and associated code.

    Is there sufficient data validation and statistical analyses of data quality?
    Not my area of expertise.

    Any Additional Overall Comments to the Author
    • R2 should be R^2 (that is, please superscript the '2'). • The sentence 'Further comparison between sequencing platforms would be useful for for exploration using as similar amplification conditions as possible. This data being provided as one such benchmark' at the end of Results is vague and needs to be rewritten. • You need to more clearly state that you do not recommend to combine MGI and Illumina data sets for metabarcoding -- unlike e.g. BGISEQ-500 and Illumina RNA-seq/short-insert WGS data which can be readily combined.

    Recommendation: Minor Revision

    **Reviewer 2. Petr Baldrian ** Are all data available and do they match the descriptions in the paper?
    No. I was not able to locate the items listed as references (26) and (27). Due to this, I was not able to fully evaluate the paper.

    Are the data and metadata consistent with relevant minimum information or reporting standards?
    No. I was not able to locate the data, see above.

    Is the data acquisition clear, complete and methodologically sound?
    No. More details on sampling (mode of sampling, area sampled, depth sampled, sample size, sample handling) is missing. Information on number of repetitive extractions of DNA and the size of sample for extraction is missing. Protocols of amplification and barcoding are referenced as (27), but I was not able to locate this reference. These details have to be provided in the text for both types of sequencers.

    Is there sufficient detail in the methods and data-processing steps to allow reproduction?
    Yes. For fungal ITS, the ITS region should be extracted before annotation.

    Is there sufficient data validation and statistical analyses of data quality?
    No. The authors do not report how do they deal with sequences of fungi that produce amplicons longer than 350 bases that can not be pair-end joint in the 2x200 base runs. Even the MiSeq 2x250 runs miss some fungal taxa (though not very many) and here the situation is still worse. For the length distribution of fungal ITS, please consult the UNITE database.

    Is the validation suitable for this type of data?
    No. There should be additional validations including the analysis of those OTUs that are abundant in one setup but missing in another one (if any).

    Is there sufficient information for others to reuse this dataset or integrate it with other data?
    No. The metadata, supposedly in reference (26) are impossible to locate.

    Any Additional Overall Comments to the Author
    I believe that this is a very good attempt to test the novel platform with fungal metabarcoding. If all required information is provided, I believe that this can be both an interesting paper and a valuable dataset.

    Recommendation: Reject (Unsound or Unusuable)

    **Reviewer 2. Re-review. ** I have now carefully read the revised version of this manuscript and I am happy with the changes that the authors implemented as a response to my comments and the comments of the other reviewer. The paper is now much more clear, especially in the methodological section and the limitations of the use of the novel sequencing platforms/formats is sufficiently discussed.

    Minor comments that should be made in the present paper:

    L58: change "bacteria" to "bacterial" L65-66: the last part of this long sentence is difficult to comprehend and should be rephreased. I suggest to divide the long sentence into two L68-69: change "produces" to "produced" L84: delete "in" L98: please explain the abbreviation "ONT", likely "Oxford Nanopore Technologies" L162: the detail of the amplification methods should be expanded at least stating the primer pairs (names and sequences) used and targeted molecular markers; from the text it appears as if ITS2 was the marker selected, yet lines 361 and 366 discuss length differences in ITS1 L246: replace "common fungi several species" with "common fungal species" L248-251: the misclassification of fungal taxa was not due to the bad performance of the sequencing platform, it was because of the low variability of the ITS2 marker. I suggest to change the text to state that genus level assignment was reached for these taxa since multiple species had the same ITS2 sequence L264-265: the main reason is that the PCR bias (preferential PCR amplification of certain templates) skews the representation of taxa if the DNA is mixed prior to amplification L331-346: this section is unclear; it should be specified which primers (primer names and sequences) with what barcodes were used for each conditions; if different primer pairs were used for different sequencing platforms, it is unclear what is the use of this comparison. This should be either clarified and explained all this section may be removed. L381: delete "so" L387-392: I suggest that this part is either removed or it is clearly described why the authors are sure that PCR replicates are not necessary (which is against all present recommendations). While the increasing fidelity of polymerases can be a fact, the main problems with parallel PCR is not errors (due to low fidelity) but random effects where primers align to templates with random frequencies. This statistical effect is impossible to handle by increasing polymerase fidelity while it is easily handled by PCR replication. L424-426: This statement is rather obvious, I suggest to delete it.