pmparser and PMDB: resources for large-scale, open studies of the biomedical literature

Joshua L. Schoenbachler
Jacob J. Hughey

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (PeerJ)

Abstract

PubMed is an invaluable resource for the biomedical community. Although PubMed is freely available, the existing API is not designed for large-scale analyses and the XML structure of the underlying data is inconvenient for complex queries. We developed an R package called pmparser to convert the data in PubMed to a relational database. Our implementation of the database, called PMDB, currently contains data on over 31 million PubMed Identifiers (PMIDs) and is updated regularly. Together, pmparser and PMDB can enable large-scale, reproducible, and transparent analyses of the biomedical literature. pmparser is licensed under GPL-2 and available at https://pmparser.hugheylab.org . PMDB is stored in PostgreSQL and compressed dumps are available on Zenodo ( https://doi.org/10.5281/zenodo.4008109 ).

PeerJ
Mar 11, 2021

Read the original source
PeerJ
Mar 11, 2021

Read the original source
PeerJ
Mar 11, 2021

Read the original source
Version published to 10.1101/2020.09.07.285924 on bioRxiv
Sep 9, 2020

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed