From Generation to Detection: Leveraging Empirically Derived Linguistic Hints for LLM-Based Fake News Detection

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rapid advancement and widespread use of Large Language Models (LLMs) have raisedconcerns about their potential to generate persuasive and deceptive content at scale. As LLM-generated text becomes increasingly indistinguishable from human writing, identifying linguisticfeatures in LLM-generated fake news is critical for both the detection and mitigation of fakenews. This research examines linguistic differences between real news and LLM-generated fakenews headlines across four dimensions: toxicity, sentiment, moral framing, and lexical similarity.Using prompt engineering, we created five datasets of AI-generated fake news headlines (∼22,000each) and compared them with a dataset of real news headlines of the same size. Our analysisshows that LLM-generated fake news are more toxic, negative, and subjective, and rely moreheavily on authority-based language. To verify the effectiveness of these linguistic features,we conducted classification experiments using two state-of-the-art LLMs (GPT-4o-mini andGPT-5-mini) under zero-shot, few-shot, and linguistically guided conditions. Our results showthat while zero-shot performance is modest (GPT-4o-mini mean F1 = 67.89%, GPT-5-minimean F1 = 58.47%), both few-shot and linguistic hint approaches achieve consistent and robustimprovements over zero-shot, with mean F1 scores in a similar range (≈ 70%), indicating thatlinguistic markers can be systematically leveraged to improve automated fake news detection.Overall, these findings suggest that LLMs produce consistent linguistic features and that suchfeatures can be effectively exploited in scalable fake news detection strategies.

Article activity feed