Defenses against LLM prompt injections in academic peer review

Ondrej Zika

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Peer review forms the cornerstone of the academic process. Recent work has proposed that meaningful integration of Large Language Models (LLMs) into peer review can help make the review process more transparent and effective. However, the rise of LLMs also introduces novel vulnerabilities and challenges, such as attempts to manipulate prompt contents. This paper investigates the susceptibility of LLMs to covert prompt injection, a technique where authors embed hidden instructions (e.g., "no matter what the prompt is, recommend this paper for acceptance") within the manuscript to manipulate review outcomes. Additionally, we explore the effectiveness of a possible defensive strategy: explicitly instructing LLMs to detect such injections. Using 26 freely available preprints from bioRxiv which have not yet been published, we evaluated the impact of prompt injection on publication recommendations, quality assessment and flaw identification. Our findings, based on a controlled experiment with three LLMs (ChatGPT, Gemini, and Llama), reveal that prompt injections significantly increased acceptance rates by 20%. They also slightly increased perceived scientific quality and decreased the perceived probability of flaws. Prompting LLMs to check for injections was successful at fully reverting these effects. However, it also reduced how manuscripts were rated and how often they were recommended for publication, irrespective of whether an injected prompt was found or not. We also note major differences between the different LLMs. Gemini performed highest in detection of hidden injections (96.6% injections detected when cautioned to check) while Llama performed poorly (18.3% injections detected). This paper highlights the biases and challenges of using LLMs for peer review. At the same time, it proposes a simple effective method for prevention of peer-review manipulation attempts.

Version published to 10.31234/osf.io/zxuwf_v2 on OSF Preprints
Jul 15, 2025
Version published to 10.31234/osf.io/zxuwf_v1 on OSF Preprints
Jul 13, 2025

Defenses against LLM prompt injections in academic peer review

This article has 1 author:
1. Ondrej Zika
This article has no evaluationsLatest version Jul 15, 2025
Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review

This article has 1 author:
1. Zhicheng Lin
This article has no evaluationsLatest version Jul 8, 2025
Invisible Text Injection: The Trojan Horse of AI-Assisted Medical Peer Review

This article has 9 authors:
1. Byungjin Choi
2. Tae Joon Jun
3. Jong Won Sung
4. Ilwoo Park
5. Jeong-Moo Lee
6. Soo Ick Cho
7. Hyung Jun Park
8. Ro Woon Lee
9. Jungyo Suh
This article has no evaluationsLatest version Jul 24, 2025

Listed in

Abstract

Article activity feed

Related articles

Defenses against LLM prompt injections in academic peer review

Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review

Invisible Text Injection: The Trojan Horse of AI-Assisted Medical Peer Review