paperboy --- A Collection of News Media Scrapers in R

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The philosophy of the `R` package paperboy is that the package is a repository for webscraping scripts for news media sites, with advanced features for quick data retrieval --- even for content behind log-ins or anti-scraping measures. Many data scientists and researchers write their own code when they have to retrieve news media content from websites. At the end of research projects, this code is often collecting digital dust on researchers hard drives instead of being made public for others to employ. `paperboy` offers writers of webscraping scripts a clear path to publish their code and earn co-authorship on the package, while promising users to deliver news media data from many websites in a consistent format. With 177 covered as of today and a default scraper that often works well enough, `paperboy` can already facilitate a large range of research projects.

Article activity feed