Next-Generation Sequencing Methods for Sensitive Characterisation of Hepatitis B Viral Genomes: A European Multicentre Study

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

Choosing a next-generation sequencing (NGS) workflow to enable sensitive and rapid detection of infectious agents remains critical. This collaborative study compared multiple NGS methods to detect and generate hepatitis B virus (HBV) genomes in samples of low viral load.

Methods

23 HBV DNA-positive plasma samples of genotypes A-E (0.2 to 6207 IU/ml) and one control sample were assayed blindly via 9 NGS methods from 6 European laboratories. Methods included untargeted metagenomics, pre-enrichment by probe-capture followed by Illumina sequencing, and HBV-specific PCR pre-amplification followed by sequencing with Nanopore or Illumina. Construction of consensus sequences was performed at the coordinating centre.

Results

Full HBV genomes were constructed at viral loads >1000 IU/ml for probe-capture methods, >200 IU/ml for PCR-Illumina methods, >10 IU/ml for PCR-Nanopore methods, and in no samples for metagenomic methods. Contamination was observed in the negative control and samples with very low viral loads for PCR-based methods. Probe-capture and metagenomic methods detected additional viruses not routinely screened in blood donations; positive results were confirmed by PCR. Costs were lowest for PCR-Nanopore, and turnaround time was highest for probe-capture methods.

Conclusion

Different methods have different advantages, and the optimal method depends on the context. NGS has the potential to delineate whole-genome sequences at low viral loads if supported by a PCR pre-amplification step. Probe-capture methods also reliably detect HBV at low viral loads but limit genome characterisation while accommodating the incidental detection of other virus species. Stringent steps are needed to prevent cross-contamination or bioinformatic noise to maximise diagnostic accuracy.

Importance

There is a great need in clinical microbiology and public health to sequence whole genomes of different viruses. Because so many different methods can be used for sequencing, it can be difficult to choose a suitable method. We compared commonly used methods on blood donor samples with low levels of hepatitis B virus DNA. We found that each method had its own benefits; however, our study highlights the need for increased vigilance regarding the risk of contamination. Our study provides necessary data for clinical and research laboratory teams to make informed decisions about the most suitable method. This would advance our understanding of hepatitis B and other viruses, ultimately benefiting both current and future patients.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/16945679.

    PAPER SUMMARY

    This multicenter comparative study tackles an important and timely challenge: evaluating next-generation sequencing (NGS) methods for HBV genome detection in low viral load samples. The authors explore a diverse range of approaches, untargeted metagenomics, probe capture, and PCR-based methods using Illumina and Oxford Nanopore platforms, across six laboratories. The work is ambitious and potentially impactful, especially in clinical contexts where sensitive and specific HBV detection is critical. However, substantial issues with contamination, data interpretation, and statistical methodology currently limit the reliability of the comparative conclusions. The study lays strong groundwork for method optimization, and with targeted revisions, it can offer valuable insight into the future of HBV diagnostics.

    MAJOR REVISIONS

    1. Contamination Across PCR-Based Methods

    Critique:

    The detection of HBV sequences with 48-100% genome coverage in negative controls across all four PCR-based protocols represents a fundamental quality control failure that requires immediate attention to preserve the study's scientific validity. A false positive result in a negative control mandates investigation and method validation review, and a lack thereof potentially invalidates the entire analytical run. The high genome coverage for the false positives is indicative of potential contamination and needs to be addressed.

    Framework for Addressing:

    • Acknowledge the limitation explicitly and transparently in the manuscript.

    • Reprocess samples with enhanced contamination control protocols (separate spaces, extraction blanks, environmental testing).

    • Delay comparative performance claims for PCR methods until a contamination-free dataset is available.

    • Increase the number and distribution of negative controls to monitor contamination throughout runs.

    2. Statistical Misapplication in Correlation Analyses

    Critique:

    Figures 1, 2, and 4 contain a fundamental statistical reporting error that requires immediate correction to ensure accurate interpretation of the correlation analyses. The figures display R2 values for Spearman correlation analyses, which is methodologically incorrect as Spearman correlations measure monotonic relationships and should report the correlation coefficient ρ (rho) rather than the coefficient of determination. Some figures display negative R² values,which are mathematically impossible for coefficient of determination calculations, further confirming the inappropriate statistical notation. Moreover, Spearman's correlation does not deduce significance but association.

    Framework for Addressing:

    • Replace all R² values with correctly reported Spearman ρ (rho) values and include 95% CIs.

    • Apply censored data models (e.g., Tobit regression or Kaplan-Meier) to properly handle non-detects.

    • For diagnostic performance, consider ROC analyses and logistic regression to assess detection probability versus viral load.

    3. Arbitrary Threshold for Contamination Identification

    Critique: The 5% nucleotide divergence threshold used to define contamination lacks empirical validation. Given known HBV subgenotype divergence (4–8%), the threshold risks misclassifying true biological variants. The authors provide no empirical analysis of sequencing eros rates and no justification for this threshold selection, and no validation of the phylogenetic analysis of known HBV strains. This arbitrary threshold choice fundamentally affects the interpretation of the contamination versus legitimate viral diversity.

    Framework for Addressing:

    • Conduct phylogenetic analysis to empirically calibrate divergence thresholds.

    • Compare contamination classifications across multiple cutoffs (3%, 5%, 7%, 10%) to test robustness.

    • Incorporate platform-specific error profiles using known standards to better ground threshold decisions.

    • Include supplemental trees showing the relationship of detected sequences to HBV references.

    MINOR REVISIONS

    1. Lack of Reference Standardization in Mapping

    Critique: Variation in reference sequences across protocols may skew assembly metrics and coverage comparisons.

    Framework for Addressing:

    • Standardize the reference genome used across methods or provide justification for divergence.

    • Document reference accession numbers and parameters used in mapping.

    • Conduct sensitivity analyses to demonstrate impact of reference choice.

    2. Incorrect Log-Scale Notation in Figures

    Critique: Figures use "<100" on log-transformed axes, which is mathematically misleading and visually confusing.

    Framework for Addressing:

    • Update axes to reflect accurate log-scale notation (e.g., "<1" or "10⁰").

    • Double-check all figures for mathematical consistency and precision in labeling.

    3. Post-hoc Contamination Thresholds

    Critique: The 10-base minimum threshold for consensus calling is applied post-hoc and lacks validation, particularly problematic for low viral load samples.

    Framework for Addressing:

    • Establish minimum coverage thresholds based on empirical LOD studies and known viral controls.

    • Use ROC curves to optimize sensitivity/specificity tradeoffs.

    • Avoid retrospective thresholding; instead, predefine and justify thresholds before analysis.

    Competing interests

    The author declares that they have no competing interests.

    Use of Artificial Intelligence (AI)

    The author declares that they used generative AI to come up with new ideas for their review.