Information Retrieval in an Infodemic: The Case of COVID-19 Publications

Douglas Teodoro
Sohrab Ferdowsi
Nikolay Borissov
Elham Kashani
David Vicente Alvarez
Jenny Copara
Racha Gouareb
Nona Naderi
Poorya Amini

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (ScreenIT)

Abstract

The COVID-19 global health crisis has led to an exponential surge in published scientific literature. In an attempt to tackle the pandemic, extremely large COVID-19–related corpora are being created, sometimes with inaccurate information, which is no longer at scale of human analyses.

Objective

In the context of searching for scientific evidence in the deluge of COVID-19–related literature, we present an information retrieval methodology for effective identification of relevant sources to answer biomedical queries posed using natural language.

Methods

Our multistage retrieval methodology combines probabilistic weighting models and reranking algorithms based on deep neural architectures to boost the ranking of relevant documents. Similarity of COVID-19 queries is compared to documents, and a series of postprocessing methods is applied to the initial ranking list to improve the match between the query and the biomedical information source and boost the position of relevant documents.

Results

The methodology was evaluated in the context of the TREC-COVID challenge, achieving competitive results with the top-ranking teams participating in the competition. Particularly, the combination of bag-of-words and deep neural language models significantly outperformed an Okapi Best Match 25–based baseline, retrieving on average, 83% of relevant documents in the top 20.

Conclusions

These results indicate that multistage retrieval supported by deep learning could enhance identification of literature for COVID-19–related questions posed using natural language.

Version published to 10.2196/30161
Sep 17, 2021
Version published to 10.2196/preprints.30161
May 3, 2021
Version published to 10.1101/2021.01.29.428847v2 on bioRxiv
Apr 12, 2021

SciScore for 10.1101/2021.01.29.428847: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
As shown in Figure 1, this is a large and dynamically growing semi-structured dataset from various sources like PubMed, PubMed Central, WHO and preprint servers like bioRxiv, medRxiv, and arXiv.	PubMed suggested: (PubMed, RRID:SCR_004846) bioRxiv suggested: (bioRxiv, RRID:SCR_003933) arXiv suggested: (arXiv, RRID:SCR_006500)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this …

SciScore for 10.1101/2021.01.29.428847: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
As shown in Figure 1, this is a large and dynamically growing semi-structured dataset from various sources like PubMed, PubMed Central, WHO and preprint servers like bioRxiv, medRxiv, and arXiv.	PubMed suggested: (PubMed, RRID:SCR_004846) bioRxiv suggested: (bioRxiv, RRID:SCR_003933) arXiv suggested: (arXiv, RRID:SCR_006500)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
No funding statement was detected.
No protocol registration statement was detected.

Read the original source

Version published to 10.1101/2021.01.29.428847v1 on bioRxiv
Jan 29, 2021

Bibliometric Analysis on Potential Biomarkers for Long COVID

This article has 5 authors:
1. Liu Xiangyu
2. Chaokai Wu
3. Suematsu Takafumi
4. Lkhagvasuren Battuvshin
5. Sudo Nobuyuki
This article has no evaluationsLatest version May 27, 2025
Automated Identification of Contextually Relevant Biomedical Entities with Grounded LLMs

This article has 6 authors:
1. Manuel Watter
2. Claudia Giuliani
3. Gita Benadi
4. Felix Engel
5. Harald Binder
6. Klaus Kaier
This article has no evaluationsLatest version Jul 8, 2025
GenBank2PubMed: Bridging Viral Genomic Data and the Scientific Literature with AI-Assisted Curation

This article has 4 authors:
1. Kaiming Tao
2. Jinru Zhou
3. Yimam Getaneh
4. Robert W. Shafer
This article has no evaluationsLatest version May 30, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Objective

Methods

Results

Conclusions

Article activity feed

Related articles

Bibliometric Analysis on Potential Biomarkers for Long COVID

Automated Identification of Contextually Relevant Biomedical Entities with Grounded LLMs

GenBank2PubMed: Bridging Viral Genomic Data and the Scientific Literature with AI-Assisted Curation