Whole Genome Sequencing of SARS-CoV-2: Adapting Illumina Protocols for Quick and Accurate Outbreak Investigation During a Pandemic

Abstract

The COVID-19 pandemic spread very fast around the world. A few days after the first detected case in South Africa, an infection started a large hospital outbreak in Durban, KwaZulu-Natal. Phylogenetic analysis of SARS-CoV-2 genomes can be used to trace the path of transmission within a hospital. It can also identify the source of the outbreak and provide lessons to improve infection prevention and control strategies. In this manuscript, we outline the obstacles we encountered in order to genotype SARS-CoV-2 in real-time during an urgent outbreak investigation. In this process, we encountered problems with the length of the original genotyping protocol, reagent stockout and sample degradation and storage. However, we managed to set up three different library preparation methods for sequencing in Illumina. We also managed to decrease the hands on library preparation time from twelve to three hours, which allowed us to complete the outbreak investigation in just a few weeks. We also fine-tuned a simple bioinformatics workflow for the assembly of high-quality genomes in real-time. In order to allow other laboratories to learn from our experience, we released all of the library preparation and bioinformatics protocols publicly and distributed them to other laboratories of the South African Network for Genomics Surveillance (SANGS) consortium.

SciScore for 10.1101/2020.06.10.144212: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
All mutations were confirmed visually with bam files using Geneious software.	Geneious suggested: (Geneious, RRID:SCR_010519)
Lineage assignments were established using a dynamic lineage classification method proposed by Rambault et al., [18] via the Phylogenetic Assignment of named Global Outbreak LINeages (PANGOLIN) software suite (https://github.com/hCoV-2019/pangolin). 10,959 GISAID reference genomes (All authors acknowledged in Supplementary Table S6) and 54 KRISP sequences …

SciScore for 10.1101/2020.06.10.144212: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
All mutations were confirmed visually with bam files using Geneious software.	Geneious suggested: (Geneious, RRID:SCR_010519)
Lineage assignments were established using a dynamic lineage classification method proposed by Rambault et al., [18] via the Phylogenetic Assignment of named Global Outbreak LINeages (PANGOLIN) software suite (https://github.com/hCoV-2019/pangolin). 10,959 GISAID reference genomes (All authors acknowledged in Supplementary Table S6) and 54 KRISP sequences were aligned in Mafft v7·313 (FF-NS-2) followed by manual inspection and editing in the Geneious Prime software suite (Biomatters Ltd, New Zealand).	Mafft suggested: (MAFFT, RRID:SCR_011811)
The resulting phylogeny was viewed and annotated in FigTree and ggtree.	FigTree suggested: (FigTree, RRID:SCR_008515)
All of the data produced has been deposited in the GISAID (consensus genomes) and at the fastq short reads deposited at the Short Read Archive (SRA) with accession: https://www.ncbi.nlm.nih.gov/nuccore/NC045512	Short Read Archive suggested: None

Results from OddPub: Thank you for sharing your data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

Our study has many limitations. Firstly, we did not have time to prepare properly for the initial sequencing as our access to the first positive samples was during a large nosocomial outbreak investigation. Secondly, the quality of the samples was not homogeneous, as some samples arrived at our laboratories weeks after being sampled from the patients. Thirdly, reagents stockouts were common during the lockdown in South Africa and we had to innovate and adapt the protocols. To summarise, despite the difficulties posed by the lockdown, we were able to complete the data generation and analysis of a large COVID-19 outbreak in South Africa in just a few weeks. We also evaluated the performance of three library preparation kits for their quality, cost, ease of use and time efficiency. In addition, we adapted a bioinformatics workflow to assemble SARS-CoV-2 genomes from raw sequence reads in near-real time. All of our protocols and raw data have been made publicly available and distributed to laboratories of the South African Network for Genomics Surveillance (SANGS) and the Africa Centre for Diseases Control (Africa CDC).

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
No funding statement was detected.
No protocol registration statement was detected.

Read the original source

Whole Genome Sequencing of SARS-CoV-2: Adapting Illumina Protocols for Quick and Accurate Outbreak Investigation During a Pandemic

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

DIVERSITY AND CLINICAL CORRELATIONS OF SARS-CoV-2 VARIANT DURING THE INTRODUCTION OF THE DELTA VARIANT IN GUATEMALA

Molecular and antigenic landscape of Influenza viruses circulating in Brazil during the 2025 season

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DIVERSITY AND CLINICAL CORRELATIONS OF SARS-CoV-2 VARIANT DURING THE INTRODUCTION OF THE DELTA VARIANT IN GUATEMALA

Molecular and antigenic landscape of Influenza viruses circulating in Brazil during the 2025 season

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.