LLM-as-Critic: Contrastive and Adversarial Strategies for Authentic Text Verification

Wei Chen
Dexin Chen

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The rapid proliferation of sophisticated large language models (LLMs) has revolutionized content generation but concurrently poses significant challenges for distinguishing human-authored from AI-generated text. Traditional detection methods often struggle with the increasing fluency of LLM outputs and their vulnerability to adversarial manipulations. In response, we propose LLM-as-Critic, a novel discriminative framework that fine-tunes a pre-trained LLM to act as an expert judge of textual authenticity. Our method integrates a multi-objective training paradigm encompassing Binary Cross-Entropy loss for fundamental classification, a bespoke Contrastive Learning loss to maximize inter-class separation, and an Adversarial Training scheme to bolster robustness against sophisticated AI-generated content. Extensive experiments across diverse datasets, including news, creative writing, and academic papers, consistently demonstrate LLM-as-Critic's superior performance, achieving F1 scores up to 0.97, significantly outperforming baselines such as Perplexity-based Detectors, Stylometric Feature Analyzers, and Fine-tuned RoBERTa Classifiers. Furthermore, ablation studies validate the incremental contribution of each training component, while human evaluation confirms a higher agreement rate with our model's classifications, reinforcing its practical utility. LLM-as-Critic establishes a new state-of-the-art in AI-generated text detection, particularly excelling in generalization to unseen generators and resilience against adversarial attacks.

Version published to 10.20944/preprints202506.0126.v1
Jun 3, 2025

ALKAFI-LLAMA3: Fine-Tuning LLMs for Precise Legal Understanding in Palestine

This article has 3 authors:
1. Rabee Al-Qasem
2. Mohannad Hendi
3. Banan Tantour
This article has no evaluationsLatest version Apr 18, 2025
VeriFactAI: A Hybrid Deep Learning Framework for Accurate Text-based Fake News Detection Using RoBERTa and BiLSTM

This article has 4 authors:
1. Rajaratna S
2. Omana Jayakodi
3. Saravanan Alagarsamy
4. Jeena R
This article has no evaluationsLatest version Apr 21, 2025
Explainability-Driven Adversarial Robustness Assessment for Generalized Deepfake Detectors

This article has 3 authors:
1. Lorenzo Cirillo
2. Andrea Gervasio
3. Irene Amerini
This article has no evaluationsLatest version Apr 30, 2025

Listed in

Abstract

Article activity feed

Related articles

ALKAFI-LLAMA3: Fine-Tuning LLMs for Precise Legal Understanding in Palestine

VeriFactAI: A Hybrid Deep Learning Framework for Accurate Text-based Fake News Detection Using RoBERTa and BiLSTM

Explainability-Driven Adversarial Robustness Assessment for Generalized Deepfake Detectors