Next-Generation Sequencing Methods for Sensitive Hepatitis B Viral Genome Analysis: A European Study

Abstract

Objectives

This multicentre study investigated the utility of next-generation sequencing (NGS) to detect and generate hepatitis B virus (HBV) genomes in samples of low viral load (from 0.2 to 6207 IU/ml).

Methods

23 HBV DNA positive plasma samples of genotypes A-E and one HBV-negative control sample were assayed blindly via 9 established NGS methods from 6 European laboratories. Methods included untargeted metagenomics, pre-enrichment by probe-capture followed by Illumina sequencing, and HBV-specific PCR pre-amplification followed by sequencing with Nanopore or Illumina.

Results

Full HBV genomes were obtained only from samples with viral loads >1000 IU/ml using probe-capture methods, >200 IU/ml using PCR-Illumina methods, >10 IU/ml using PCR-Nanopore methods, and in no samples using metagenomic methods. Contamination was observed in the negative control and samples with very low viral loads in all PCR-based methods. Probe-capture and metagenomic methods detected additional viruses not routinely screened in blood donations, including polyomaviruses and herpesviruses; positive results were confirmed by PCR.

Conclusions

NGS may delineate whole-genome sequences at low viral loads if supported by a PCR pre-amplification step. Probe-capture methods also reliably detect HBV without pre-amplification but achieve limited genome characterisation at low viral loads; they may additionally detect a wide range of blood-borne viruses.

This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/16945679.

PAPER SUMMARY

This multicenter comparative study tackles an important and timely challenge: evaluating next-generation sequencing (NGS) methods for HBV genome detection in low viral load samples. The authors explore a diverse range of approaches, untargeted metagenomics, probe capture, and PCR-based methods using Illumina and Oxford Nanopore platforms, across six laboratories. The work is ambitious and potentially impactful, especially in clinical contexts where sensitive and specific HBV detection is critical. However, substantial issues with contamination, data interpretation, and statistical methodology currently limit the reliability of the comparative conclusions. The study lays strong groundwork for method optimization, and with targeted revisions, it can offer valuable insight into the future of HBV diagnostics.

MAJOR REVISIONS

1. Contamination Across PCR-Based Methods

Critique:

The detection of HBV sequences with 48-100% genome coverage in negative controls across all four PCR-based protocols represents a fundamental quality control failure that requires immediate attention to preserve the study's scientific validity. A false positive result in a negative control mandates investigation and method validation review, and a lack thereof potentially invalidates the entire analytical run. The high genome coverage for the false positives is indicative of potential contamination and needs to be addressed.

Framework for Addressing:

Acknowledge the limitation explicitly and transparently in the manuscript.
Reprocess samples with enhanced contamination control protocols (separate spaces, extraction blanks, environmental testing).
Delay comparative performance claims for PCR methods until a contamination-free dataset is available.
Increase the number and distribution of negative controls to monitor contamination throughout runs.

2. Statistical Misapplication in Correlation Analyses

Critique:

Figures 1, 2, and 4 contain a fundamental statistical reporting error that requires immediate correction to ensure accurate interpretation of the correlation analyses. The figures display R2 values for Spearman correlation analyses, which is methodologically incorrect as Spearman correlations measure monotonic relationships and should report the correlation coefficient ρ (rho) rather than the coefficient of determination. Some figures display negative R² values,which are mathematically impossible for coefficient of determination calculations, further confirming the inappropriate statistical notation. Moreover, Spearman's correlation does not deduce significance but association.

Framework for Addressing:

Replace all R² values with correctly reported Spearman ρ (rho) values and include 95% CIs.
Apply censored data models (e.g., Tobit regression or Kaplan-Meier) to properly handle non-detects.
For diagnostic performance, consider ROC analyses and logistic regression to assess detection probability versus viral load.

3. Arbitrary Threshold for Contamination Identification

Critique: The 5% nucleotide divergence threshold used to define contamination lacks empirical validation. Given known HBV subgenotype divergence (4–8%), the threshold risks misclassifying true biological variants. The authors provide no empirical analysis of sequencing eros rates and no justification for this threshold selection, and no validation of the phylogenetic analysis of known HBV strains. This arbitrary threshold choice fundamentally affects the interpretation of the contamination versus legitimate viral diversity.

Framework for Addressing:

Conduct phylogenetic analysis to empirically calibrate divergence thresholds.
Compare contamination classifications across multiple cutoffs (3%, 5%, 7%, 10%) to test robustness.
Incorporate platform-specific error profiles using known standards to better ground threshold decisions.
Include supplemental trees showing the relationship of detected sequences to HBV references.

MINOR REVISIONS

1. Lack of Reference Standardization in Mapping

Critique: Variation in reference sequences across protocols may skew assembly metrics and coverage comparisons.

Framework for Addressing:

Standardize the reference genome used across methods or provide justification for divergence.
Document reference accession numbers and parameters used in mapping.
Conduct sensitivity analyses to demonstrate impact of reference choice.

2. Incorrect Log-Scale Notation in Figures

Critique: Figures use "<100" on log-transformed axes, which is mathematically misleading and visually confusing.

Framework for Addressing:

Update axes to reflect accurate log-scale notation (e.g., "<1" or "10⁰").
Double-check all figures for mathematical consistency and precision in labeling.

3. Post-hoc Contamination Thresholds

Critique: The 10-base minimum threshold for consensus calling is applied post-hoc and lacks validation, particularly problematic for low viral load samples.

Framework for Addressing:

Establish minimum coverage thresholds based on empirical LOD studies and known viral controls.
Use ROC curves to optimize sensitivity/specificity tradeoffs.
Avoid retrospective thresholding; instead, predefine and justify thresholds before analysis.

Competing interests

The author declares that they have no competing interests.

Use of Artificial Intelligence (AI)

The author declares that they used generative AI to come up with new ideas for their review.

Read the original source

Next-Generation Sequencing Methods for Sensitive Hepatitis B Viral Genome Analysis: A European Study

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Objectives

Methods

Results

Conclusions

Article activity feed

PAPER SUMMARY

PAPER SUMMARY

MAJOR REVISIONS

1. Contamination Across PCR-Based Methods

2. Statistical Misapplication in Correlation Analyses

3. Arbitrary Threshold for Contamination Identification

MINOR REVISIONS

Competing interests

Use of Artificial Intelligence (AI)

Molecular Epidemiology Analysis of Imported Chikungunya Virus Cases Based on Second- and Third-Generation Sequencing

Genetic Diversity of Hepatitis B Virus Genomes Isolated from Patients Attending Health Facilities in HBV-Endemic Regions in Kenya

Sanger sequencing-the gatekeeper to exclude false positives in nucleic acid-based diagnostics for infectious diseases

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Objectives

Methods

Results

Conclusions

Article activity feed

PAPER SUMMARY

PAPER SUMMARY

MAJOR REVISIONS

1. Contamination Across PCR-Based Methods

2. Statistical Misapplication in Correlation Analyses

3. Arbitrary Threshold for Contamination Identification

MINOR REVISIONS

Competing interests

Use of Artificial Intelligence (AI)

Related articles

Molecular Epidemiology Analysis of Imported Chikungunya Virus Cases Based on Second- and Third-Generation Sequencing

Genetic Diversity of Hepatitis B Virus Genomes Isolated from Patients Attending Health Facilities in HBV-Endemic Regions in Kenya

Sanger sequencing-the gatekeeper to exclude false positives in nucleic acid-based diagnostics for infectious diseases