Words don't matter: effects in large language model unstructured responses by minor prompt lexical changes

Enric Grau-Luque

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Prompt engineering has become an essential skill for AI engineers and data scientists, as well-crafted prompts enable better results and optimal costs. While research has extensively studied the effects of different prompt aspects—focusing on structures, formatting, and strategies—very little work has explored the impact of minor lexical changes, such as single character or word modifications. Although it is well-documented that such changes affect model outputs in diverse ways, most studies compare outputs by measuring accuracy or structure. However, little research has examined how small changes affect the meaning of unstructured outputs while accounting for the stochastic outputs of large language models (LLMs). This work performs experiments to explore these effects systematically with several examples and model sizes. The results suggest that paraphrasing or word selection changes do not affect the answer's substance, but special attention should be paid to typos and correct negations and affirmations.

Version published to 10.14293/pr2199.002109.v1
Oct 13, 2025

Large Language Models Robustness Against Perturbation

This article has 4 authors:
1. Saeed S. Alahmari
2. Lawerence Hall
3. Peter R. Mouton
4. Dmitry Goldgof
This article has no evaluationsLatest version Oct 9, 2025
Machines flunking an exam: Evaluating large language models on course-related open questions

This article has 6 authors:
1. Jingxiu Huang
2. Yufeng Wei
3. Lixin Zhang
4. Ruilin Lai
5. Feiyu Lai
6. Yunxiang Zheng
This article has no evaluationsLatest version Sep 30, 2025
The Reality of Prompt Engineering: Simplicity Often Outperforms Sophistication in Reasoning Tasks

This article has 3 authors:
1. Abdulhamid Onawole
2. Anandhi Vivek Dhukaram
3. Adetola Adeniyi
This article has no evaluationsLatest version Aug 21, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Large Language Models Robustness Against Perturbation

Machines flunking an exam: Evaluating large language models on course-related open questions

The Reality of Prompt Engineering: Simplicity Often Outperforms Sophistication in Reasoning Tasks