pmparser and PMDB: resources for large-scale, open studies of the biomedical literature

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

PubMed is an invaluable resource for the biomedical community. Although PubMed is freely available, the existing API is not designed for large-scale analyses and the XML structure of the underlying data is inconvenient for complex queries. We developed an R package called pmparser to convert the data in PubMed to a relational database. Our implementation of the database, called PMDB, currently contains data on over 31 million PubMed Identifiers (PMIDs) and is updated regularly. Together, pmparser and PMDB can enable large-scale, reproducible, and transparent analyses of the biomedical literature. pmparser is licensed under GPL-2 and available at https://pmparser.hugheylab.org . PMDB is available in both PostgreSQL ( DOI 10.5281/zenodo.4008109 ) and Google BigQuery ( https://console.cloud.google.com/bigquery?project=pmdb-bq&d=pmdb ).

Article activity feed