Whole Genome Sequencing of SARS-CoV-2: Adapting Illumina Protocols for Quick and Accurate Outbreak Investigation During a Pandemic

This article has been Reviewed by the following groups

Read the full article

Abstract

The COVID-19 pandemic spread very fast around the world. A few days after the first detected case in South Africa, an infection started a large hospital outbreak in Durban, KwaZulu-Natal. Phylogenetic analysis of SARS-CoV-2 genomes can be used to trace the path of transmission within a hospital. It can also identify the source of the outbreak and provide lessons to improve infection prevention and control strategies. In this manuscript, we outline the obstacles we encountered in order to genotype SARS-CoV-2 in real-time during an urgent outbreak investigation. In this process, we encountered problems with the length of the original genotyping protocol, reagent stockout and sample degradation and storage. However, we managed to set up three different library preparation methods for sequencing in Illumina. We also managed to decrease the hands on library preparation time from twelve to three hours, which allowed us to complete the outbreak investigation in just a few weeks. We also fine-tuned a simple bioinformatics workflow for the assembly of high-quality genomes in real-time. In order to allow other laboratories to learn from our experience, we released all of the library preparation and bioinformatics protocols publicly and distributed them to other laboratories of the South African Network for Genomics Surveillance (SANGS) consortium.

Article activity feed

  1. SciScore for 10.1101/2020.06.10.144212: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    All mutations were confirmed visually with bam files using Geneious software.
    Geneious
    suggested: (Geneious, RRID:SCR_010519)
    Lineage assignments were established using a dynamic lineage classification method proposed by Rambault et al., [18] via the Phylogenetic Assignment of named Global Outbreak LINeages (PANGOLIN) software suite (https://github.com/hCoV-2019/pangolin). 10,959 GISAID reference genomes (All authors acknowledged in Supplementary Table S6) and 54 KRISP sequences were aligned in Mafft v7·313 (FF-NS-2) followed by manual inspection and editing in the Geneious Prime software suite (Biomatters Ltd, New Zealand).
    Mafft
    suggested: (MAFFT, RRID:SCR_011811)
    The resulting phylogeny was viewed and annotated in FigTree and ggtree.
    FigTree
    suggested: (FigTree, RRID:SCR_008515)
    All of the data produced has been deposited in the GISAID (consensus genomes) and at the fastq short reads deposited at the Short Read Archive (SRA) with accession: https://www.ncbi.nlm.nih.gov/nuccore/NC045512
    Short Read Archive
    suggested: None

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Our study has many limitations. Firstly, we did not have time to prepare properly for the initial sequencing as our access to the first positive samples was during a large nosocomial outbreak investigation. Secondly, the quality of the samples was not homogeneous, as some samples arrived at our laboratories weeks after being sampled from the patients. Thirdly, reagents stockouts were common during the lockdown in South Africa and we had to innovate and adapt the protocols. To summarise, despite the difficulties posed by the lockdown, we were able to complete the data generation and analysis of a large COVID-19 outbreak in South Africa in just a few weeks. We also evaluated the performance of three library preparation kits for their quality, cost, ease of use and time efficiency. In addition, we adapted a bioinformatics workflow to assemble SARS-CoV-2 genomes from raw sequence reads in near-real time. All of our protocols and raw data have been made publicly available and distributed to laboratories of the South African Network for Genomics Surveillance (SANGS) and the Africa Centre for Diseases Control (Africa CDC).

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.