The Cognitive Fingerprint Problem: Transformer-LSTM Perplexity Geometry for Fair and Adversarially Robust AI Text Detection

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

As AI text detection systems become increasingly embedded in academic integrity enforcement, recent empirical audits have exposed a severe and systematic bias against English as a Second Language (ESL) writers. Standard detectors, which rely predominantly on sub-word perplexity measured against models trained on native English, frequently conflate the natural linguistic simplicity of human language learners with the statistical predictability of Large Language Models (LLMs). Furthermore, naive retraining strategies that incorporate ESL data to mitigate this bias introduce a critical new adversarial vulnerability: LLMs can be explicitly prompted to simulate ESL writing patterns---including grammatical errors and constrained vocabulary---allowing them to evade even ESL-aware detectors. The result is a scenario in which innocent ESL students are falsely accused of academic misconduct while a deliberately adversarial AI goes undetected. To resolve this dual challenge, we propose a novel feature-extraction architecture termed Geometric Fairness. Rather than measuring how "fluent" a text is on a single scale, this system maps every essay into a two-dimensional bivariate feature space defined by two orthogonal perplexity signals: (1) a Native-Expert, a pre-trained DistilGPT-2 Transformer computing sub-word perplexity against standard American English; and (2) an ESL-Expert, a custom Character-Level Long Short-Term Memory (LSTM) network trained exclusively on the ELLIPSE learner corpus, computing next-character perplexity. By contrasting sub-word probability under standard English against the sequence-level character variance of authentic learner interlanguage, our architecture geometrically separates the cognitive stochasticity of human error from the probabilistic smoothing of AI generation. On the most adversarial classification task considered---distinguishing authentic ESL student writing from AI-generated text specifically prompted to mimic ESL style---our dual-stream logistic regression classifier achieved an accuracy of 93.3%, outperforming a single-dimensional perplexity baseline of 75.0% by 18.3 percentage points. The dual-stream method achieved a Receiver Operating Characteristic Area Under the Curve (ROC-AUC) of 0.96, compared to 0.73 for the 1D ESL-only baseline. Critically, the false-positive rate---the proportion of genuine human ESL students who would be wrongly accused---was reduced from 29.0% to 9.7%, a relative reduction of 66.7%.

Article activity feed