paperboy --- A Collection of News Media Scrapers in R

Johannes B. Gruber

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The philosophy of the `R` package paperboy is that the package is a repository for webscraping scripts for news media sites, with advanced features for quick data retrieval --- even for content behind log-ins or anti-scraping measures. Many data scientists and researchers write their own code when they have to retrieve news media content from websites. At the end of research projects, this code is often collecting digital dust on researchers hard drives instead of being made public for others to employ. `paperboy` offers writers of webscraping scripts a clear path to publish their code and earn co-authorship on the package, while promising users to deliver news media data from many websites in a consistent format. With 177 covered as of today and a default scraper that often works well enough, `paperboy` can already facilitate a large range of research projects.

Version published to 10.31235/osf.io/hu6qw_v1 on OSF Preprints
Jun 23, 2025

@Grok Is This True? LLM-Powered Fact-Checking on Social Media

This article has 3 authors:
1. Thomas Renault
2. Mohsen Mosleh
3. David Gertler Rand
This article has no evaluationsLatest version Jan 21, 2026
The Table of Media Bias Elements: A sentence-level taxonomy of media bias types and propaganda techniques

This article has 2 authors:
1. Tim Menzner
2. Jochen L. Leidner
This article has no evaluationsLatest version Jan 13, 2026
Exploration of Large Language Models forGeotagging of Social Media Posts

This article has 2 authors:
1. Riwaz Udas
2. Richard Sinnott
This article has no evaluationsLatest version Feb 3, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

@Grok Is This True? LLM-Powered Fact-Checking on Social Media

The Table of Media Bias Elements: A sentence-level taxonomy of media bias types and propaganda techniques

Exploration of Large Language Models forGeotagging of Social Media Posts