Assessment of Inter-Laboratory Differences in SARS-CoV-2 Consensus Genome Assemblies between Public Health Laboratories in Australia

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Whole-genome sequencing of viral isolates is critical for informing transmission patterns and for the ongoing evolution of pathogens, especially during a pandemic. However, when genomes have low variability in the early stages of a pandemic, the impact of technical and/or sequencing errors increases. We quantitatively assessed inter-laboratory differences in consensus genome assemblies of 72 matched SARS-CoV-2-positive specimens sequenced at different laboratories in Sydney, Australia. Raw sequence data were assembled using two different bioinformatics pipelines in parallel, and resulting consensus genomes were compared to detect laboratory-specific differences. Matched genome sequences were predominantly concordant, with a median pairwise identity of 99.997%. Identified differences were predominantly driven by ambiguous site content. Ignoring these produced differences in only 2.3% (5/216) of pairwise comparisons, each differing by a single nucleotide. Matched samples were assigned the same Pango lineage in 98.2% (212/216) of pairwise comparisons, and were mostly assigned to the same phylogenetic clade. However, epidemiological inference based only on single nucleotide variant distances may lead to significant differences in the number of defined clusters if variant allele frequency thresholds for consensus genome generation differ between laboratories. These results underscore the need for a unified, best-practices approach to bioinformatics between laboratories working on a common outbreak problem.

Article activity feed

  1. SciScore for 10.1101/2021.08.19.21262296: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Each SARS-CoV-2-positive extract is also sent to the Institute of Clinical Pathology and Medical Research (ICPMR), NSW Health Pathology-West, NSW, Australia, for WGS according to their established protocols (8).
    WGS
    suggested: None
    Library preparation was carried out using an Illumina Nextera XT Kit, followed by sequencing on an Illumina iSeq or MiniSeq (150 cycles).
    MiniSeq
    suggested: None
    Clean reads were then mapped to the NCBI RefSeq assembly of SARS-CoV-2 (NC_045512.2) using bwa mem v0.7.17-r1188 (26), with unmapped reads discarded, and primer sequences were soft-clipped from the alignment using ivar trim v.
    RefSeq
    suggested: (RefSeq, RRID:SCR_003496)
    Alignments were converted to pileup format using samtools mpileup v1.10 (27) without discarding anomalous read pairs (-A), per-base alignment quality disabled (-B), and no minimum PHRED quality for bases (-Q 0).
    samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    Demultiplexed raw sequencing data from Lab2 were quality trimmed using Trimmomatic (v0.36, sliding window of 4, minimum read quality score of 20, leading/trailing quality of 5) (29).
    Trimmomatic
    suggested: (Trimmomatic, RRID:SCR_011848)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.