High-throughput SARS-CoV-2 and host genome sequencing from single nasopharyngeal swabs

J. E. Gorzynski
H. N. De Jong
D. Amar
C. Hughes
A. Ioannidis
R. Bierman
D. Liu
Y. Tanigawa
A. L. Kistler
J. Kamm
J. Kim
L. Cappello
N. F. Neff
S. Rubinacci
O. Delaneau
M. J. Shoura
K. Seo
A. Kirillova
A. Raja
S. Sutton
C. Huang
M. K. Sahoo
K. C. Mallempati
G. Montero-Martin
K. Osoegawa
N. Watson
N. Hammond
R. Joshi
M. A. Fernández-Viña
J. W. Christle
M.T. Wheeler
P. Febbo
K. Farh
G. P. Schroth
F. DeSouza
J. Palacios
J. Salzman
B. A. Pinsky
M. A. Rivas
C.D. Bustamante
E. A. Ashley
V. N. Parikh

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

During COVID19 and other viral pandemics, rapid generation of host and pathogen genomic data is critical to tracking infection and informing therapies. There is an urgent need for efficient approaches to this data generation at scale. We have developed a scalable, high throughput approach to generate high fidelity low pass whole genome and HLA sequencing, viral genomes, and representation of human transcriptome from single nasopharyngeal swabs of COVID19 patients.

SciScore for 10.1101/2020.07.27.20163147: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Sample Collection and diagnostics: Residual VTM from SARS-CoV-2 positive nasopharyngeal swabs collected during clinical assessment of asymptomatic and symptomatic patients at Stanford Healthcare were	Stanford Healthcare suggested: None
Non-SARS-CoV-2 reads were filtered out with Kraken220, using an index of human and viral genomes in RefSeq (index downloaded from https://genexa.ch/sars2-bioinformatics-resources/).	RefSeq suggested: (RefSeq, RRID:SCR_003496)
Reads per COVID gene were …

SciScore for 10.1101/2020.07.27.20163147: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Sample Collection and diagnostics: Residual VTM from SARS-CoV-2 positive nasopharyngeal swabs collected during clinical assessment of asymptomatic and symptomatic patients at Stanford Healthcare were	Stanford Healthcare suggested: None
Non-SARS-CoV-2 reads were filtered out with Kraken220, using an index of human and viral genomes in RefSeq (index downloaded from https://genexa.ch/sars2-bioinformatics-resources/).	RefSeq suggested: (RefSeq, RRID:SCR_003496)
Reads per COVID gene were collected from the ReadsPerGene STAR output file, and the total mappable reads were collected from the Log.	STAR suggested: (STAR, RRID:SCR_015899)
500 μl of 1.3 pM DNA sequencing library was loaded into a MiniSeq Mid Output Kit (300-cycles) (FC-420-1004), and sequenced using MiniSeq DNA sequencer (Illumina Inc., San Diego, CA).	MiniSeq suggested: None
27,28 Host Sequence Alignment: Low-coverage FASTQ sequences underwent quality control assessment via FastQC v0.11.8 before alt-aware alignment to GRCh38.p12 using BWA-MEM v0.7.17-r1188.	FastQC suggested: (FastQC, RRID:SCR_014583) BWA-MEM suggested: (Sniffles, RRID:SCR_017619)
After duplicate marking, base quality score recalibration was performed with Picard Tools’ BaseRecalibrator and high-confidence variant call sets from dbSNP and the 1000 Genomes Project.	Picard suggested: (Picard, RRID:SCR_006525) dbSNP suggested: (dbSNP, RRID:SCR_002338) 1000 Genomes Project suggested: (1000 Genomes Project and AWS, RRID:SCR_008801)
Quality control metrics, including coverage, were generated with Qualimap BAMQC v2.2.1, Samtools v1.10, and Mosdepth v0.2.9.	Qualimap suggested: (QualiMap, RRID:SCR_001209) Samtools suggested: (SAMTOOLS, RRID:SCR_002105)
Finally, quality control reports for each sample were aggregated using MultiQC v1.9	MultiQC suggested: (MultiQC, RRID:SCR_014982)
) Variant Calling, Imputation, PCA, Kinship: BAM files were used for an initial calling with bcftools v1.9 mpileup29.	bcftools suggested: (SAMtools/BCFtools, RRID:SCR_005227)

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

SciScore for 10.1101/2020.07.27.20163147: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement

Institutional Review Board approval for anonymous sequencing of host and viral genomics was obtained from the Stanford University School of Medicine IRB.

Randomization

not detected.

Blinding

This also enabled confirmation of six blindly duplicated samples and 2 pairings of first-degree relatives through kinship analysis (Figure 2C).

Power Analysis

not detected.

Sex as a biological variable

not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
The estimated phylogenetic tree is the maximum clade credibility tree obtained with BEAST 215 using a fixed mutation rate of 1.04x10-3 per base per year, the Coalescent Extended Bayesian Skyline prior16, and the HKY substitution model17. (F) RPKMs for individual SARS-CoV-2 genes were averaged over samples with similar CT values.	BEAST suggested: (BEAST, SCR_010228)
Methods Sample Collection and diagnostics: Residual VTM from SARS-CoV-2 positive nasopharyngeal swabs collected during clinical assessment of asymptomatic and symptomatic patients at Stanford Healthcare were used.	Stanford Healthcare suggested: None
Non-SARS-CoV-2 reads were filtered out with Kraken220, using an index of human and viral genomes in RefSeq (index downloaded from https://genexa.ch/sars2-bioinformatics-resources/).	RefSeq suggested: (RefSeq, SCR_003496)
Reads per COVID gene were collected from the ReadsPerGene STAR output file, and the total mappable reads were collected from the Log.	STAR suggested: (STAR, SCR_015899)
500 µl of 1.3 pM DNA sequencing library was loaded into a MiniSeq Mid Output Kit (300-cycles) (FC-420-1004), and sequenced using MiniSeq DNA sequencer (Illumina Inc., San Diego, CA).	MiniSeq suggested: None
, KIR ligands (C1 and C2) and imputed HLA haplotypes.27,28 Host Sequence Alignment: Low-coverage FASTQ sequences underwent quality control assessment via FastQC v0.11.8 before alt-aware alignment to GRCh38.p12 using BWA-MEM v0.7.17-r1188.	FastQC suggested: (FastQC, SCR_014583) <div style="margin-bottom:8px"> <div><b>BWA-MEM</b></div> <div>suggested: (Sniffles, <a href="https://scicrunch.org/resources/Any/search?q=SCR_017619">SCR_017619</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">After duplicate marking, base quality score recalibration was performed with Picard Tools’ BaseRecalibrator and high-confidence variant call sets from dbSNP and the 1000 Genomes Project.</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>Picard</b></div> <div>suggested: (Picard, <a href="https://scicrunch.org/resources/Any/search?q=SCR_006525">SCR_006525</a>)</div> </div> <div style="margin-bottom:8px"> <div><b>dbSNP</b></div> <div>suggested: (dbSNP, <a href="https://scicrunch.org/resources/Any/search?q=SCR_002338">SCR_002338</a>)</div> </div> <div style="margin-bottom:8px"> <div><b>1000 Genomes Project</b></div> <div>suggested: (1000 Genomes Project and AWS, <a href="https://scicrunch.org/resources/Any/search?q=SCR_008801">SCR_008801</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">Quality control metrics, including coverage, were generated with Qualimap BAMQC v2.2.1, Samtools v1.10, and Mosdepth v0.2.9.</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>Qualimap</b></div> <div>suggested: (QualiMap, <a href="https://scicrunch.org/resources/Any/search?q=SCR_001209">SCR_001209</a>)</div> </div> <div style="margin-bottom:8px"> <div><b>Samtools</b></div> <div>suggested: (Samtools, <a href="https://scicrunch.org/resources/Any/search?q=SCR_002105">SCR_002105</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">Finally, quality control reports for each sample were aggregated using MultiQC v1.9</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>MultiQC</b></div> <div>suggested: (MultiQC, <a href="https://scicrunch.org/resources/Any/search?q=SCR_014982">SCR_014982</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">Variant Calling, Imputation, PCA, Kinship: BAM files were used for an initial calling with bcftools v1.9 mpileup29.</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>bcftools</b></div> <div>suggested: (SAMtools/BCFtools, <a href="https://scicrunch.org/resources/Any/search?q=SCR_005227">SCR_005227</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">Beckman independence fellowship (MJS) and American Heart Association (MJS, KS), the National Heart Lung and Blood Institute (K08HL143185 to VNP), The John Taylor Babbitt Foundation (VNP) and Sarnoff Cardiovascular Research Foundation (VNP)</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>American Heart Association</b></div> <div>suggested: (American Heart Association, <a href="https://scicrunch.org/resources/Any/search?q=SCR_007210">SCR_007210</a>)</div> </div> </td></tr></table> Data from additional tools added to each annotation on a weekly basis. About SciScore SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore is not a substitute for expert review. SciScore checks for the presence and correctness of RRIDs (research resource identifiers) in the manuscript, and detects sentences that appear to be missing RRIDs. SciScore also checks to make sure that rigor criteria are addressed by authors. It does this by detecting sentences that discuss criteria such as blinding or power analysis. SciScore does not guarantee that the rigor criteria that it detects are appropriate for the particular study. Instead it assists authors, editors, and reviewers by drawing attention to sections of the manuscript that contain or should contain various rigor criteria and key resources. For details on the results shown here, including references cited, please follow this link.

Read the original source

Version published to 10.1101/2020.07.27.20163147 on medRxiv
Jul 29, 2020

DIVERSITY AND CLINICAL CORRELATIONS OF SARS-CoV-2 VARIANT DURING THE INTRODUCTION OF THE DELTA VARIANT IN GUATEMALA

This article has 13 authors:
1. Claudia Carranza
2. Lucia Ortiz
3. Maria Eugenia Castellanos
4. Ana Silvia Gonzalez-Reiche
5. Renata Mendizabal-Cabrera
6. Zain Khalil
7. Adriana van DeGuchte
8. Keith Farrugia
9. Mariana Herrera
10. Ernesto Mena
11. Celia Cordon-Rosales
12. Harm van Bakel
13. Daniel R. Perez
Reviewed by Access Microbiology

This article has 3 evaluationsLatest version Feb 3, 2026Latest activity Jul 20, 2025
One Health Viral Metagenomics for Pandemic Preparedness: Validated mNGS Workflows for Viral Detection and Genome Recovery from Swab and Tissue Specimens

This article has 14 authors:
1. Tristan Russell
2. Elisa Formiconi
3. Alison Murphy
4. Jimmy Hortion
5. Máire McElroy
6. Mícheál Casey
7. Laura Garza Cuartero
8. John F Mee
9. Hanne Jahns
10. Christine Kelly
11. Joanne Byrne
12. Eoin R Feeney
13. Patrick WG Mallon
14. Virginie W Gautier
This article has no evaluationsLatest version Jan 16, 2026
Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary

This article has 1 author:
1. Marvin I. De los Santos
This article has no evaluationsLatest version Dec 22, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DIVERSITY AND CLINICAL CORRELATIONS OF SARS-CoV-2 VARIANT DURING THE INTRODUCTION OF THE DELTA VARIANT IN GUATEMALA

One Health Viral Metagenomics for Pandemic Preparedness: Validated mNGS Workflows for Viral Detection and Genome Recovery from Swab and Tissue Specimens

Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary