Applying Lexical Link Analysis to Discover Insights from Public Information on COVID-19

Abstract

SARS-Cov-2, the deadly and novel virus, which has caused a worldwide pandemic and drastic loss of human lives and economic activities. An open data set called the COVID-19 Open Research Dataset or CORD-19 contains large set full text scientific literature on SARS-CoV-2. The Next Strain consists of a database of SARS-CoV-2 viral genomes from since 12/3/2019. We applied an unique information mining method named lexical link analysis (LLA) to answer the call to action and help the science community answer high-priority scientific questions related to SARS-CoV-2. We first text-mined the CORD-19. We also data-mined the next strain database. Finally, we linked two databases. The linked databases and information can be used to discover the insights and help the research community to address high-priority questions related to the SARS-CoV-2’s genetics, tests, and prevention.

In this paper, we show how to apply an unique information mining method lexical link analysis (LLA) to link unstructured (CORD-19) and structured (Next Strain) data sets to relevant publications, integrate text and data mining into a single platform to discover the insights that can be visualized, and validated to answer the high-priority questions of genetics, incubation, treatment, symptoms, and prevention of COVID-19.

SciScore for 10.1101/2020.05.06.079798: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
LLA is related to but significantly different from the methods such as bag-of-words (BOW) methods, Automap (7), Latent Dirichlet Allocation (LDA) (14), Latent Semantic Analysis (LSA) (16), Probabilistic Latent Semantic Analysis (PLSA) (17) and can be jointly used with NEE (20, 21), POS methods (25).	Automap suggested: (AutoMap, RRID:SCR_013095)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the …

SciScore for 10.1101/2020.05.06.079798: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
LLA is related to but significantly different from the methods such as bag-of-words (BOW) methods, Automap (7), Latent Dirichlet Allocation (LDA) (14), Latent Semantic Analysis (LSA) (16), Probabilistic Latent Semantic Analysis (PLSA) (17) and can be jointly used with NEE (20, 21), POS methods (25).	Automap suggested: (AutoMap, RRID:SCR_013095)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
No funding statement was detected.
No protocol registration statement was detected.

Read the original source

Applying Lexical Link Analysis to Discover Insights from Public Information on COVID-19

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed