Dynamic data-driven meta-analysis for prioritisation of host genes implicated in COVID-19

This article has been Reviewed by the following groups

Read the full article

Abstract

The increasing body of literature describing the role of host factors in COVID-19 pathogenesis demonstrates the need to combine diverse, multi-omic data to evaluate and substantiate the most robust evidence and inform development of therapies. Here we present a dynamic ranking of host genes implicated in human betacoronavirus infection (SARS-CoV-2, SARS-CoV, MERS-CoV, seasonal coronaviruses). We conducted an extensive systematic review of experiments identifying potential host factors. Gene lists from diverse sources were integrated using Meta-Analysis by Information Content (MAIC). This previously described algorithm uses data-driven gene list weightings to produce a comprehensive ranked list of implicated host genes. From 32 datasets, the top ranked gene was PPIA, encoding cyclophilin A, a druggable target using cyclosporine. Other highly-ranked genes included proposed prognostic factors ( CXCL10 , CD4 , CD3E ) and investigational therapeutic targets ( IL1A ) for COVID-19. Gene rankings also inform the interpretation of COVID-19 GWAS results, implicating FYCO1 over other nearby genes in a disease-associated locus on chromosome 3. Researchers can search and review the gene rankings and the contribution of different experimental methods to gene rank at https://baillielab.net/maic/covid19 . As new data are published we will regularly update the list of genes as a resource to inform and prioritise future studies.

Article activity feed

  1. SciScore for 10.1101/2020.08.27.20182238: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Literature search: A systematic literature search of PubMed was conducted on 28/04/2020 and updated weekly until 06/07/2020.
    PubMed
    suggested: (PubMed, RRID:SCR_004846)
    Potentially relevant pre-print manuscripts were identified by screening all papers categorised as COVID-19-related in the bioRxiv and medRxiv servers.
    bioRxiv
    suggested: (bioRxiv, RRID:SCR_003933)
    Gene, transcript and protein names or identification numbers were converted to the associated HGNC gene symbol, or an equivalent Ensembl or Refseq symbol where no HGNC symbol existed.
    Ensembl
    suggested: (Ensembl, RRID:SCR_002344)
    Refseq
    suggested: (RefSeq, RRID:SCR_003496)
    Non-primate genes were mapped to their human homologues using the NCBI Homologene database,11 or excluded from the analysis if no human homologue could be identified.
    NCBI Homologene
    suggested: None
    Gene set enrichment analysis: Rank-based gene set enrichment analysis was performed using the package FGSEA in R version 3.5.2, with genes ranked by MAIC score.
    Gene set enrichment analysis
    suggested: (Gene Set Enrichment Analysis, RRID:SCR_003199)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Advantages and limitations of data integration via MAIC: The principal advantage of the MAIC approach is that it allows integration of data from diverse sources. Unlike other methods for gene list comparison such as vote counting or robust rank aggregation,49 MAIC applies a data-driven weighting to each dataset, accepts both ranked and unranked lists, and includes user-defined categories which prevent any single method from overwhelming the results. MAIC outperforms other methods for predicting antiviral genes.3 This meta-analysis is restricted to studies involving genome-wide hypotheses or screening data for large gene sets, and does not consider evidence from candidate gene genetic studies or single-gene perturbations. Where a single gene has been investigated extensively but genome-scale studies are sparse, our approach may underestimate the relative strength of evidence for certain genes. Single gene studies, however, are likely to focus preferentially on genes that fit pre-conceived ideas of disease pathogenesis and may be prone to other biases such as publication bias, something which we mitigated against in our inclusion criteria. Genetic perturbation data are still relatively sparse for SARS-CoV-2 and other human betacoronaviruses: only one genome-wide CRISPR knockout screen and two other sub-genome-scale screens (kinome-wide RNAi and interferon-stimulated gene overexpression screens) were included in the meta-analysis. Limited data of this type could be responsible f...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.