LLM-as-Critic: Contrastive and Adversarial Strategies for Authentic Text Verification

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rapid proliferation of sophisticated large language models (LLMs) has revolutionized content generation but concurrently poses significant challenges for distinguishing human-authored from AI-generated text. Traditional detection methods often struggle with the increasing fluency of LLM outputs and their vulnerability to adversarial manipulations. In response, we propose LLM-as-Critic, a novel discriminative framework that fine-tunes a pre-trained LLM to act as an expert judge of textual authenticity. Our method integrates a multi-objective training paradigm encompassing Binary Cross-Entropy loss for fundamental classification, a bespoke Contrastive Learning loss to maximize inter-class separation, and an Adversarial Training scheme to bolster robustness against sophisticated AI-generated content. Extensive experiments across diverse datasets, including news, creative writing, and academic papers, consistently demonstrate LLM-as-Critic's superior performance, achieving F1 scores up to 0.97, significantly outperforming baselines such as Perplexity-based Detectors, Stylometric Feature Analyzers, and Fine-tuned RoBERTa Classifiers. Furthermore, ablation studies validate the incremental contribution of each training component, while human evaluation confirms a higher agreement rate with our model's classifications, reinforcing its practical utility. LLM-as-Critic establishes a new state-of-the-art in AI-generated text detection, particularly excelling in generalization to unseen generators and resilience against adversarial attacks.

Article activity feed