Vapur: A Search Engine to Find Related Protein - Compound Pairs in COVID-19 Literature
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Coronavirus Disease of 2019 (COVID-19) created dire consequences globally and triggered an intense scientific effort from different domains. The resulting publications created a huge text collection in which finding the studies related to a biomolecule of interest is challenging for general purpose search engines because the publications are rich in domain specific terminology. Here, we present Vapur: an online COVID-19 search engine specifically designed to find related protein - chemical pairs. Vapur is empowered with a relation-oriented inverted index that is able to retrieve and group studies for a query biomolecule with respect to its related entities. The inverted index of Vapur is automatically created with a BioNLP pipeline and integrated with an online user interface. The online interface is designed for the smooth traversal of the current literature by domain researchers and is publicly available at https://tabilab.cmpe.boun.edu.tr/vapur/ .
Article activity feed
-
-
SciScore for 10.1101/2020.09.05.284224: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources To this end, we first preprocessed the sentences to explicitly encode the entities in the input and then finetuned BioBERT with the preprocessed sentences in ChemProt. BioBERTsuggested: (BioBERT, RRID:SCR_017547)Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:We first analyzed 41 sample sentences in which Vapur identified a biochemical relation as a first step to discover the limitations of the complete pipeline. Then, we asked six biologists/chemists to use Vapur and rate its …
SciScore for 10.1101/2020.09.05.284224: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources To this end, we first preprocessed the sentences to explicitly encode the entities in the input and then finetuned BioBERT with the preprocessed sentences in ChemProt. BioBERTsuggested: (BioBERT, RRID:SCR_017547)Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:We first analyzed 41 sample sentences in which Vapur identified a biochemical relation as a first step to discover the limitations of the complete pipeline. Then, we asked six biologists/chemists to use Vapur and rate its different aspects to demonstrate the success and usefulness of Vapur for future research. Our inspection of 41 sample sentences indicated that most of the incorrect relation labels were due to incorrect entity assignment by BERN. In some cases, parts of the protein sequence such as N-terminal, carboxyl terminal or residue names such as Asp238 are recognized as compounds. Table 7 illustrates sample sentences with incorrectly labeled entities. Other examples that were manually checked by a domain expert are presented in the Appendices. In order to evaluate the real-life usefulness of Vapur, we asked six domain experts to use Vapur for five COVID-19 related queries (totalling up to 30) of their own. They each filled in a questionnaire where for each query they indicated (i) if each of the top three search results is related to the query, (ii) if similar molecules predicted by Vapur are in fact useful, and (iii) if the extracted sentences are useful. They also rated the ease of use of Vapur between 1 (very difficult) and 5 (very easy) and assessed its usefulness for future research on COVID-19. The expert evaluations demonstrated that 27 out of 30 (90%) top search results and 76 out of 902 (84%) top three search results are biochemically related to the query, su...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- No funding statement was detected.
- No protocol registration statement was detected.
-