Knowledge synthesis of 100 million biomedical documents augments the deep expression profiling of coronavirus receptors

Abstract

The COVID-19 pandemic demands assimilation of all biomedical knowledge to decode mechanisms of pathogenesis. Despite the recent renaissance in neural networks, a platform for the real-time synthesis of the exponentially growing biomedical literature and deep omics insights is unavailable. Here, we present the nferX platform for dynamic inference from over 45 quadrillion possible conceptual associations from unstructured text, and triangulation with insights from single-cell RNA-sequencing, bulk RNA-seq and proteomics from diverse tissue types. A hypothesis-free profiling of ACE2 suggests tongue keratinocytes, olfactory epithelial cells, airway club cells and respiratory ciliated cells as potential reservoirs of the SARS-CoV-2 receptor. We find the gut as the putative hotspot of COVID-19, where a maturation correlated transcriptional signature is shared in small intestine enterocytes among coronavirus receptors (ACE2, DPP4, ANPEP). A holistic data science platform triangulating insights from structured and unstructured data holds potential for accelerating the generation of impactful biological insights and hypotheses.

SciScore for 10.1101/2020.03.24.005702: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Single-cell data processing pipeline: For each study, a counts matrix was downloaded from a public data repository such as the Gene Expression Omnibus (GEO) or the Broad Institute Single Cell Portal (Table S1).	Gene Expression Omnibus suggested: (Gene Expression Omnibus (GEO, RRID:SCR_005012)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We …

SciScore for 10.1101/2020.03.24.005702: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Single-cell data processing pipeline: For each study, a counts matrix was downloaded from a public data repository such as the Gene Expression Omnibus (GEO) or the Broad Institute Single Cell Portal (Table S1).	Gene Expression Omnibus suggested: (Gene Expression Omnibus (GEO, RRID:SCR_005012)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

No conflict of interest statement was detected. If there are no conflicts, we encourage authors to explicit state so.
No funding statement was detected.
No protocol registration statement was detected.

Read the original source

Knowledge synthesis of 100 million biomedical documents augments the deep expression profiling of coronavirus receptors

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed