Deconvoluting complex correlates of COVID-19 severity with a multi-omic pandemic tracking strategy

Abstract

The SARS-CoV-2 pandemic has differentially impacted populations across race and ethnicity. A multi-omic approach represents a powerful tool to examine risk across multi-ancestry genomes. We leverage a pandemic tracking strategy in which we sequence viral and host genomes and transcriptomes from nasopharyngeal swabs of 1049 individuals (736 SARS-CoV-2 positive and 313 SARS-CoV-2 negative) and integrate them with digital phenotypes from electronic health records from a diverse catchment area in Northern California. Genome-wide association disaggregated by admixture mapping reveals novel COVID-19-severity-associated regions containing previously reported markers of neurologic, pulmonary and viral disease susceptibility. Phylodynamic tracking of consensus viral genomes reveals no association with disease severity or inferred ancestry. Summary data from multiomic investigation reveals metagenomic and HLA associations with severe COVID-19. The wealth of data available from residual nasopharyngeal swabs in combination with clinical data abstracted automatically at scale highlights a powerful strategy for pandemic tracking, and reveals distinct epidemiologic, genetic, and biological associations for those at the highest risk.

SciScore for 10.1101/2021.08.04.21261547: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Ethics	not detected.
Sex as a biological variable	not detected.
Randomization	Briefly, 3 ul of total nucleic acid was used as input for a randomly primed cDNA synthesis reaction.
Blinding	not detected.
Power Analysis	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Sample Collection and diagnostics: Residual VTM from SARS-CoV-2 positive nasopharyngeal swabs collected during clinical assessment of asymptomatic and symptomatic patients at Stanford Healthcare were used in accordance with the Stanford School of Medicine Institutional Review	Stanford Healthcare suggested: None
Clinical data were obtained through the STAnford Research Repository (STARR), a Stanford Medicine’s approved …

SciScore for 10.1101/2021.08.04.21261547: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Ethics	not detected.
Sex as a biological variable	not detected.
Randomization	Briefly, 3 ul of total nucleic acid was used as input for a randomly primed cDNA synthesis reaction.
Blinding	not detected.
Power Analysis	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Sample Collection and diagnostics: Residual VTM from SARS-CoV-2 positive nasopharyngeal swabs collected during clinical assessment of asymptomatic and symptomatic patients at Stanford Healthcare were used in accordance with the Stanford School of Medicine Institutional Review	Stanford Healthcare suggested: None
Clinical data were obtained through the STAnford Research Repository (STARR), a Stanford Medicine’s approved resource for working with clinical data for research purposes extracted from the Epic database management system used by the Stanford hospitals.	STAnford suggested: (Stanford CNI, RRID:SCR_014529) STARR suggested: (Starr, RRID:SCR_001071)
Viral and Metagenomic Genome Alignment: For SARS-CoV-2 genomes, FASTQ sequences were aligned to the SARS-CoV-2 reference genome NC_045512.2 using minimap2.36 Non-SARS-CoV-2 reads were filtered out with Kraken2,37 using an index of human and viral genomes in RefSeq (index downloaded from https://genexa.ch/sars2-bioinformatics-resources/).	RefSeq suggested: (RefSeq, RRID:SCR_003496)
Host and metagenomic RNA alignment was performed using STAR run against a combined index of the human reference genome grch38, SARS-CoV2 (SARSCoV2_NC_045512.2), and ERCC spike-ins.	STAR suggested: (STAR, RRID:SCR_004463)
Host Genome Sequence Alignment: Low-coverage FASTQ sequences underwent quality control assessment via FastQC v0.11.8 before alt-aware alignment to GRCh38.p12 using BWA-MEM v0.7.17-r1188.	FastQC suggested: (FastQC, RRID:SCR_014583) BWA-MEM suggested: (Sniffles, RRID:SCR_017619)
After duplicate marking, base quality score recalibration was performed with Picard Tools’ BaseRecalibrator and high-confidence variant call sets from dbSNP and the 1000 Genomes Project.	Picard suggested: (Picard, RRID:SCR_006525) dbSNP suggested: (dbSNP, RRID:SCR_002338) 1000 Genomes Project suggested: (1000 Genomes Project and AWS, RRID:SCR_008801)
Quality control metrics, including coverage, were generated with Qualimap BAMQC v2.2.1, Samtools v1.10, and Mosdepth v0.2.9.	Qualimap suggested: (QualiMap, RRID:SCR_001209) Samtools suggested: (SAMTOOLS, RRID:SCR_002105)
Finally, quality control reports for each sample were aggregated using MultiQC v1.9	MultiQC suggested: (MultiQC, RRID:SCR_014982)
Reproducible code and steps are available at Protocols.io doi: (https://www.protocols.io/private/8CFBD1AD8FE611EA815E0A58A9FEAC2A) All high confidence calls were contributed to the COVID19 Host Genetics Initiative.3 Variant Calling, Imputation, PCA, Kinshiship: BAM files were used for an initial calling with bcftools v1.9 mpileup.	bcftools suggested: (SAMtools/BCFtools, RRID:SCR_005227)
500 µl of 1.3 pM DNA sequencing library was loaded into a MiniSeq Mid Output Kit (300-cycles) (FC-420-1004), and sequenced using MiniSeq DNA sequencer (Illumina Inc., San Diego, CA).	MiniSeq suggested: None
When self-reported ethnicity was not available, genetic ancestry calculated from the low pass WGS in this study was used as described above.	WGS suggested: None
HLA serotype and allele frequencies were calculated in both Mild and Severe groups, and Odd Ratio (OR:	Mild suggested: (MILD, RRID:SCR_003335)
We assumed the HKY mutation model 57 with default hyperparameter priors in the BEAST2 software 58.	BEAST2 suggested: (BEAST2, RRID:SCR_017307)

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Deconvoluting complex correlates of COVID-19 severity with a multi-omic pandemic tracking strategy

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Natural selection links immune-gene variation to blood virome burden in indigenous Orang Asli from tropical Southeast Asia

Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary

Evidence-based genetic variants to gene mapping and prioritization uncovers distinct molecular pathophysiology and therapeutic landscape in polycystic ovary syndrome patients of different ethnicities.

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Natural selection links immune-gene variation to blood virome burden in indigenous Orang Asli from tropical Southeast Asia

Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary

Evidence-based genetic variants to gene mapping and prioritization uncovers distinct molecular pathophysiology and therapeutic landscape in polycystic ovary syndrome patients of different ethnicities.