Knowledge synthesis from 100 million biomedical documents augments the deep expression profiling of coronavirus receptors

AJ Venkatakrishnan
Arjun Puranik
Akash Anand
David Zemmour
Xiang Yao
Xiaoying Wu
Ramakrishna Chilaka
Dariusz K. Murakowski
Kristopher Standish
Bharathwaj Raghunathan
Tyler Wagner
Enrique Garcia-Rivera
Hugo Solomon
Abhinav Garg
Rakesh Barve
Anuli Anyanwu-Ofili
Najat Khan
Venky Soundararajan

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

The COVID-19 pandemic demands assimilation of all available biomedical knowledge to decode its mechanisms of pathogenicity and transmission. Despite the recent renaissance in unsupervised neural networks for decoding unstructured natural languages, a platform for the real-time synthesis of the exponentially growing biomedical literature and its comprehensive triangulation with deep omic insights is not available. Here, we present the nferX platform for dynamic inference from over 45 quadrillion possible conceptual associations extracted from unstructured biomedical text, and their triangulation with Single Cell RNA-sequencing based insights from over 25 tissues. Using this platform, we identify intersections between the pathologic manifestations of COVID-19 and the comprehensive expression profile of the SARS-CoV-2 receptor ACE2. We find that tongue keratinocytes, airway club cells, and ciliated cells are likely underappreciated targets of SARS-CoV-2 infection, in addition to type II pneumocytes and olfactory epithelial cells. We further identify mature small intestinal enterocytes as a possible hotspot of COVID-19 fecal-oral transmission, where an intriguing maturation-correlated transcriptional signature is shared between ACE2 and the other coronavirus receptors DPP4 (MERS-CoV) and ANPEP (α-coronavirus). This study demonstrates how a holistic data science platform can leverage unprecedented quantities of structured and unstructured publicly available data to accelerate the generation of impactful biological insights and hypotheses.

The nferX Platform Single-cell resource - https://academia.nferx.com/

SciScore for 10.1101/2020.03.24.005702: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Single-cell data processing pipeline: For each study, a counts matrix was downloaded from a public data repository such as the Gene Expression Omnibus (GEO) or the Broad Institute Single Cell Portal (Table S1).	Gene Expression Omnibus suggested: (Gene Expression Omnibus (GEO, RRID:SCR_005012)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We …

SciScore for 10.1101/2020.03.24.005702: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Single-cell data processing pipeline: For each study, a counts matrix was downloaded from a public data repository such as the Gene Expression Omnibus (GEO) or the Broad Institute Single Cell Portal (Table S1).	Gene Expression Omnibus suggested: (Gene Expression Omnibus (GEO, RRID:SCR_005012)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

No conflict of interest statement was detected. If there are no conflicts, we encourage authors to explicit state so.
No funding statement was detected.
No protocol registration statement was detected.

Read the original source

Version published to 10.1101/2020.03.24.005702 on bioRxiv
Mar 29, 2020

Understanding Pathways in Bioinformatics, Genomics, and Health Applications

This article has 1 author:
1. Diptarup Mallick
This article has no evaluationsLatest version Jan 19, 2026
One Health Viral Metagenomics for Pandemic Preparedness: Validated mNGS Workflows for Viral Detection and Genome Recovery from Swab and Tissue Specimens

This article has 14 authors:
1. Tristan Russell
2. Elisa Formiconi
3. Alison Murphy
4. Jimmy Hortion
5. Máire McElroy
6. Mícheál Casey
7. Laura Garza Cuartero
8. John F Mee
9. Hanne Jahns
10. Christine Kelly
11. Joanne Byrne
12. Eoin R Feeney
13. Patrick WG Mallon
14. Virginie W Gautier
This article has no evaluationsLatest version Jan 16, 2026
Decrypting viral dark matter through key proteins using an NLP-enhanced framework

This article has 10 authors:
1. Zhihua Du
2. Min Li
3. Kaihuang Lin
4. Bo Xing
5. Yuehua Ou
6. Wenchen Song
7. Jie Chen
8. Junhua Li
9. Jianqiang Li
10. Minfeng Xiao
This article has no evaluationsLatest version Jan 13, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Understanding Pathways in Bioinformatics, Genomics, and Health Applications

One Health Viral Metagenomics for Pandemic Preparedness: Validated mNGS Workflows for Viral Detection and Genome Recovery from Swab and Tissue Specimens

Decrypting viral dark matter through key proteins using an NLP-enhanced framework