CovidNLP : A Web Application for Distilling Systemic Implications of COVID-19 Pandemic with Natural Language Processing

This article has been Reviewed by the following groups

Read the full article

Abstract

The flood of conflicting COVID-19 research has revealed that COVID-19 continues to be an enigma. Although more than 14,000 research articles on COVID-19 have been published with the disease taking a pandemic proportion, clinicians and researchers are struggling to distill knowledge for furthering clinical management and research. In this study, we address this gap for a targeted user group, i.e. clinicians, researchers, and policymakers by applying natural language processing to develop a CovidNLP dashboard in order to speed up knowledge discovery. The WHO has created a repository of about more than 5000 peer-reviewed and curated research articles on varied aspects including epidemiology, clinical features, diagnosis, treatment, social factors, and economics. We summarised all the articles in the WHO Database through an extractive summarizer followed by an exploration of the feature space using word embeddings which were then used to visualize the summarized associations of COVID-19 as found in the text. Clinicians, researchers, and policymakers will not only discover the direct effects of COVID-19 but also the systematic implications such as the anticipated rise in TB and cancer mortality due to the non-availability of drugs during the export lockdown as highlighted by our models. These demonstrate the utility of mining massive literature with natural language processing for rapid distillation and knowledge updates. This can help the users understand, synthesize, and take pre-emptive action with the available peer-reviewed evidence on COVID-19. Our models will be continuously updated with new literature and we have made our resource CovidNLP publicly available in a user-friendly fashion at http://covidnlp.tavlab.iiitd.edu.in/ .

Data Availability Statement

All the data used in this study are publicly available from the WHO Covid-19 Global Literature on coronavirus disease maintained at https://search.bvsalud.org/global-literature-on-novel-coronavirus-2019-ncov/ . Our analysis and the interactive resource CovidNLP is publicly available in a user friendly fashion at http://covidnlp.tavlab.iiitd.edu.in

Article activity feed

  1. SciScore for 10.1101/2020.04.25.20079129: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Word-embedding models: A low-dimensional representation of the corpus trained through word2Vec algorithm[34] which was then visualized in 3 dimensions in order to aid the exploration of feature space.
    word2Vec
    suggested: (word2vec, RRID:SCR_014776)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Strength, limitations and future work: This study provides a model and an interface for the clinicians, researchers and policymakers to extract relevant information in the face of confusion that surrounds COVID-19. We used only peer reviewed articles in our analysis to make sure that information is validated to some extent, also realizing that even the peer-reviewed literature is currently conflicted. Another limitation of the study is the availability of only the abstracts on the WHO resource and the relatively small size of the resource. Future work in this direction will include full-texts of the available peer reviewed articles, primarily for the purpose of better model tuning. We will also create sections on our dashboard for knowledge synthesized by including pre-prints, expert summaries of social media posts of expert organizations and people, Government & Ministry Reports, WHO Reports and various other resources for improved inference in terms of data resources.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.