Genetic architecture of host proteins interacting with SARS-CoV-2

This article has been Reviewed by the following groups

Read the full article

Abstract

Strategies to develop therapeutics for SARS-CoV-2 infection may be informed by experimental identification of viral-host protein interactions in cellular assays and measurement of host response proteins in COVID-19 patients. Identification of genetic variants that influence the level or activity of these proteins in the host could enable rapid ‘in silico’ assessment in human genetic studies of their causal relevance as molecular targets for new or repurposed drugs to treat COVID-19. We integrated large-scale genomic and aptamer-based plasma proteomic data from 10,708 individuals to characterize the genetic architecture of 179 host proteins reported to interact with SARS-CoV-2 proteins or to participate in the host response to COVID-19. We identified 220 host DNA sequence variants acting in cis (MAF 0.01-49.9%) and explaining 0.3-70.9% of the variance of 97 of these proteins, including 45 with no previously known protein quantitative trait loci (pQTL) and 38 encoding current drug targets. Systematic characterization of pQTLs across the phenome identified protein-drug-disease links, evidence that putative viral interaction partners such as MARK3 affect immune response, and establish the first link between a recently reported variant for respiratory failure of COVID-19 patients at the ABO locus and hypercoagulation, i.e. maladaptive host response. Our results accelerate the evaluation and prioritization of new drug development programmes and repurposing of trials to prevent, treat or reduce adverse outcomes. Rapid sharing and dynamic and detailed interrogation of results is facilitated through an interactive webserver ( https://omicscience.org/apps/covidpgwas/ ).

Article activity feed

  1. SciScore for 10.1101/2020.07.01.182709: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Mapping of protein targets across platforms: We mapped each candidate protein to its UniProt-ID (https://www.uniprot.org/) and used those to select mapping aptamers and Olink measures based on annotation files provided by the vendors.
    https://www.uniprot.org/
    suggested: (Universal Protein Resource, RRID:SCR_002380)
    Identification of relevant GWAS traits: To enable linkage to reported GWAS-variants we downloaded all SNPs reported in the GWAS catalog (19/12/2019, https://www.ebi.ac.uk/gwas/) and pruned the list of variant-outcome associations manually to omit previous protein-wide GWAS.
    https://www.ebi.ac.uk/gwas/
    suggested: (GWAS: Catalog of Published Genome-Wide Association Studies, RRID:SCR_012745)
    GO-term annotation within the UniProt database has the advantage of being manually curated while aiming to omit unspecific parent terms.
    UniProt
    suggested: (UniProtKB, RRID:SCR_004426)
    We performed DAVID functional enrichment analyses on all the genes significantly associated (Bonferroni-adjusted p<0.05) with plasma levels of the proteins to identify biological processes (Benjamini-Hochberg adjusted p<0.05) that may explain the associations found beyond the protein encoding genes.
    DAVID
    suggested: (DAVID, RRID:SCR_001881)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    However, important limitations apply. Firstly, protein abundances have been measured in plasma, which may differ from the intracellular role of proteins, and include purposefully secreted as well as leaked proteins. Secondly, while aptamer-based techniques provide the broadest coverage of the plasma proteome, specificity can be compromised for specific protein targets and evidence using complementary techniques such as Olink or mass spectrometry efforts is useful for validation of signals. Thirdly, in-depth phenotypic characterization of the high-priority cis-pQTLs requires appropriate formal and statistical follow-up, such as colocalisation, where the genomic architecture permits existing approaches not yet optimised for multiple secondary signals and outcomes, and cis-GRS evaluation in independent and adequately powered studies for the trait of interest.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.