Deconvoluting complex correlates of COVID-19 severity with a multi-omic pandemic tracking strategy

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The SARS-CoV-2 pandemic has differentially impacted populations across race and ethnicity. A multi-omic approach represents a powerful tool to examine risk across multi-ancestry genomes. We leverage a pandemic tracking strategy in which we sequence viral and host genomes and transcriptomes from nasopharyngeal swabs of 1049 individuals (736 SARS-CoV-2 positive and 313 SARS-CoV-2 negative) and integrate them with digital phenotypes from electronic health records from a diverse catchment area in Northern California. Genome-wide association disaggregated by admixture mapping reveals novel COVID-19-severity-associated regions containing previously reported markers of neurologic, pulmonary and viral disease susceptibility. Phylodynamic tracking of consensus viral genomes reveals no association with disease severity or inferred ancestry. Summary data from multiomic investigation reveals metagenomic and HLA associations with severe COVID-19. The wealth of data available from residual nasopharyngeal swabs in combination with clinical data abstracted automatically at scale highlights a powerful strategy for pandemic tracking, and reveals distinct epidemiologic, genetic, and biological associations for those at the highest risk.

Article activity feed

  1. SciScore for 10.1101/2021.08.04.21261547: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Ethicsnot detected.
    Sex as a biological variablenot detected.
    RandomizationBriefly, 3 ul of total nucleic acid was used as input for a randomly primed cDNA synthesis reaction.
    Blindingnot detected.
    Power Analysisnot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Sample Collection and diagnostics: Residual VTM from SARS-CoV-2 positive nasopharyngeal swabs collected during clinical assessment of asymptomatic and symptomatic patients at Stanford Healthcare were used in accordance with the Stanford School of Medicine Institutional Review
    Stanford Healthcare
    suggested: None
    Clinical data were obtained through the STAnford Research Repository (STARR), a Stanford Medicine’s approved resource for working with clinical data for research purposes extracted from the Epic database management system used by the Stanford hospitals.
    STAnford
    suggested: (Stanford CNI, RRID:SCR_014529)
    STARR
    suggested: (Starr, RRID:SCR_001071)
    Viral and Metagenomic Genome Alignment: For SARS-CoV-2 genomes, FASTQ sequences were aligned to the SARS-CoV-2 reference genome NC_045512.2 using minimap2.36 Non-SARS-CoV-2 reads were filtered out with Kraken2,37 using an index of human and viral genomes in RefSeq (index downloaded from https://genexa.ch/sars2-bioinformatics-resources/).
    RefSeq
    suggested: (RefSeq, RRID:SCR_003496)
    Host and metagenomic RNA alignment was performed using STAR run against a combined index of the human reference genome grch38, SARS-CoV2 (SARSCoV2_NC_045512.2), and ERCC spike-ins.
    STAR
    suggested: (STAR, RRID:SCR_004463)
    Host Genome Sequence Alignment: Low-coverage FASTQ sequences underwent quality control assessment via FastQC v0.11.8 before alt-aware alignment to GRCh38.p12 using BWA-MEM v0.7.17-r1188.
    FastQC
    suggested: (FastQC, RRID:SCR_014583)
    BWA-MEM
    suggested: (Sniffles, RRID:SCR_017619)
    After duplicate marking, base quality score recalibration was performed with Picard Tools’ BaseRecalibrator and high-confidence variant call sets from dbSNP and the 1000 Genomes Project.
    Picard
    suggested: (Picard, RRID:SCR_006525)
    dbSNP
    suggested: (dbSNP, RRID:SCR_002338)
    1000 Genomes Project
    suggested: (1000 Genomes Project and AWS, RRID:SCR_008801)
    Quality control metrics, including coverage, were generated with Qualimap BAMQC v2.2.1, Samtools v1.10, and Mosdepth v0.2.9.
    Qualimap
    suggested: (QualiMap, RRID:SCR_001209)
    Samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    Finally, quality control reports for each sample were aggregated using MultiQC v1.9
    MultiQC
    suggested: (MultiQC, RRID:SCR_014982)
    Reproducible code and steps are available at Protocols.io doi: (https://www.protocols.io/private/8CFBD1AD8FE611EA815E0A58A9FEAC2A) All high confidence calls were contributed to the COVID19 Host Genetics Initiative.3 Variant Calling, Imputation, PCA, Kinshiship: BAM files were used for an initial calling with bcftools v1.9 mpileup.
    bcftools
    suggested: (SAMtools/BCFtools, RRID:SCR_005227)
    500 µl of 1.3 pM DNA sequencing library was loaded into a MiniSeq Mid Output Kit (300-cycles) (FC-420-1004), and sequenced using MiniSeq DNA sequencer (Illumina Inc., San Diego, CA).
    MiniSeq
    suggested: None
    When self-reported ethnicity was not available, genetic ancestry calculated from the low pass WGS in this study was used as described above.
    WGS
    suggested: None
    HLA serotype and allele frequencies were calculated in both Mild and Severe groups, and Odd Ratio (OR:
    Mild
    suggested: (MILD, RRID:SCR_003335)
    We assumed the HKY mutation model 57 with default hyperparameter priors in the BEAST2 software 58.
    BEAST2
    suggested: (BEAST2, RRID:SCR_017307)

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.