Synthesizing Social Media Posts: The Development and Validation of Synthetic Tweets for Misinformation Research

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper describes the development and validation of a synthetic dataset of social media posts designed to support misinformation education research. Because existing datasets of real social media content are topically narrow, inconsistently labeled, and often unsuitable for training use, the authors used ChatGPT to generate a large bank of tweets, each exemplifying one of eight rhetorical manipulation tactics — including ad hominem attacks, emotional language, false dichotomies, and slippery slope arguments — or none in the case of control items. Posts underwent manual review and refinement to improve authenticity and reduce bias, yielding 374 final stimuli rendered as realistic tweet images. A validation study with 480 nationally representative U.S. participants assessed the posts across dimensions including trustworthiness, shareability, and emotional arousal. Results confirmed that the synthetic posts were distinguishable by tactic while remaining representative of authentic social media discourse, supporting their use as training stimuli in misinformation literacy interventions.

Article activity feed