Dynamic data-driven meta-analysis for prioritisation of host genes implicated in COVID-19

Abstract

The increasing body of literature describing the role of host factors in COVID-19 pathogenesis demonstrates the need to combine diverse, multi-omic data to evaluate and substantiate the most robust evidence and inform development of therapies. Here we present a dynamic ranking of host genes implicated in human betacoronavirus infection (SARS-CoV-2, SARS-CoV, MERS-CoV, seasonal coronaviruses). We conducted an extensive systematic review of experiments identifying potential host factors. Gene lists from diverse sources were integrated using Meta-Analysis by Information Content (MAIC). This previously described algorithm uses data-driven gene list weightings to produce a comprehensive ranked list of implicated host genes. From 32 datasets, the top ranked gene was PPIA, encoding cyclophilin A, a druggable target using cyclosporine. Other highly-ranked genes included proposed prognostic factors ( CXCL10 , CD4 , CD3E ) and investigational therapeutic targets ( IL1A ) for COVID-19. Gene rankings also inform the interpretation of COVID-19 GWAS results, implicating FYCO1 over other nearby genes in a disease-associated locus on chromosome 3. Researchers can search and review the gene rankings and the contribution of different experimental methods to gene rank at https://baillielab.net/maic/covid19 . As new data are published we will regularly update the list of genes as a resource to inform and prioritise future studies.

SciScore for 10.1101/2020.08.27.20182238: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Literature search: A systematic literature search of PubMed was conducted on 28/04/2020 and updated weekly until 06/07/2020.	PubMed suggested: (PubMed, RRID:SCR_004846)
Potentially relevant pre-print manuscripts were identified by screening all papers categorised as COVID-19-related in the bioRxiv and medRxiv servers.	bioRxiv suggested: (bioRxiv, RRID:SCR_003933)
Gene, transcript and protein names or identification numbers were converted to the associated HGNC gene symbol, or an equivalent Ensembl or Refseq symbol where no HGNC symbol existed.	Ensembl suggested: (Ensembl, RRID:SCR_002344) Refseq s…

SciScore for 10.1101/2020.08.27.20182238: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Literature search: A systematic literature search of PubMed was conducted on 28/04/2020 and updated weekly until 06/07/2020.	PubMed suggested: (PubMed, RRID:SCR_004846)
Potentially relevant pre-print manuscripts were identified by screening all papers categorised as COVID-19-related in the bioRxiv and medRxiv servers.	bioRxiv suggested: (bioRxiv, RRID:SCR_003933)
Gene, transcript and protein names or identification numbers were converted to the associated HGNC gene symbol, or an equivalent Ensembl or Refseq symbol where no HGNC symbol existed.	Ensembl suggested: (Ensembl, RRID:SCR_002344) Refseq suggested: (RefSeq, RRID:SCR_003496)
Non-primate genes were mapped to their human homologues using the NCBI Homologene database,11 or excluded from the analysis if no human homologue could be identified.	NCBI Homologene suggested: None
Gene set enrichment analysis: Rank-based gene set enrichment analysis was performed using the package FGSEA in R version 3.5.2, with genes ranked by MAIC score.	Gene set enrichment analysis suggested: (Gene Set Enrichment Analysis, RRID:SCR_003199)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

Advantages and limitations of data integration via MAIC: The principal advantage of the MAIC approach is that it allows integration of data from diverse sources. Unlike other methods for gene list comparison such as vote counting or robust rank aggregation,49 MAIC applies a data-driven weighting to each dataset, accepts both ranked and unranked lists, and includes user-defined categories which prevent any single method from overwhelming the results. MAIC outperforms other methods for predicting antiviral genes.3 This meta-analysis is restricted to studies involving genome-wide hypotheses or screening data for large gene sets, and does not consider evidence from candidate gene genetic studies or single-gene perturbations. Where a single gene has been investigated extensively but genome-scale studies are sparse, our approach may underestimate the relative strength of evidence for certain genes. Single gene studies, however, are likely to focus preferentially on genes that fit pre-conceived ideas of disease pathogenesis and may be prone to other biases such as publication bias, something which we mitigated against in our inclusion criteria. Genetic perturbation data are still relatively sparse for SARS-CoV-2 and other human betacoronaviruses: only one genome-wide CRISPR knockout screen and two other sub-genome-scale screens (kinome-wide RNAi and interferon-stimulated gene overexpression screens) were included in the meta-analysis. Limited data of this type could be responsible f...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Dynamic data-driven meta-analysis for prioritisation of host genes implicated in COVID-19

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Host transcriptional profiling identifies B cell associated genes to be upregulated in individuals with asymptomatic COVID-19 and latent tuberculosis

Meta-analysis of functional genomics studies reveals conserved cellular pathways required by viruses of pandemic concern

Integrated Transcriptomic Analysis Reveals Distinct Immune Response Signatures and Prognostic Biomarkers in SARS-CoV-2 Infection

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Host transcriptional profiling identifies B cell associated genes to be upregulated in individuals with asymptomatic COVID-19 and latent tuberculosis

Meta-analysis of functional genomics studies reveals conserved cellular pathways required by viruses of pandemic concern

Integrated Transcriptomic Analysis Reveals Distinct Immune Response Signatures and Prognostic Biomarkers in SARS-CoV-2 Infection