Using Text Mining to Track Outbreak Trends in Global Surveillance of Emerging Diseases: ProMED-mail

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

ProMED-mail (Program for Monitoring Emerging Disease) is an international disease outbreak monitoring and early warning system. Every year, users contribute thousands of reports that include reference to infectious diseases and toxins. However, due to the uneven distribution of the reports for each disease, traditional statistics-based text mining techniques, represented by term frequency-related algorithm, are not suitable. Thus, we conducted a study in three steps (i) report filtering, (ii) keyword extraction from reports and finally (iii) word co-occurrence network analysis to fill the gap between ProMED and its utilization. The keyword extraction was performed with the TextRank algorithm, keywords co-occurrence networks were then produced using the top keywords from each document and multiple network centrality measures were computed to analyse the co-occurrence networks. We used two major outbreaks in recent years, Ebola, 2014 and Zika 2015, as cases to illustrate and validate the process. We found that the extracted information structures are consistent with World Health Organisation description of the timeline and phases of the epidemics. Our research presents a pipeline that can extract and organize the information to characterize the evolution of epidemic outbreaks. It also highlights the potential for ProMED to be utilized in monitoring, evaluating and improving responses to outbreaks.

Article activity feed

  1. SciScore for 10.1101/2020.01.10.20017145: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    METHODS The pipeline shown in Figure 1 is built in Python.
    Python
    suggested: (IPython, SCR_001658)
    Evaluation of ProMED-mail as an electronic early warning system for emerging animal diseases: 1996 to 2004.
    ProMED-mail
    suggested: (ProMed-Mail, SCR_010260)

    Data from additional tools added to each annotation on a weekly basis.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore is not a substitute for expert review. SciScore checks for the presence and correctness of RRIDs (research resource identifiers) in the manuscript, and detects sentences that appear to be missing RRIDs. SciScore also checks to make sure that rigor criteria are addressed by authors. It does this by detecting sentences that discuss criteria such as blinding or power analysis. SciScore does not guarantee that the rigor criteria that it detects are appropriate for the particular study. Instead it assists authors, editors, and reviewers by drawing attention to sections of the manuscript that contain or should contain various rigor criteria and key resources. For details on the results shown here, including references cited, please follow this link.