Assessing the Response Strategies of Large Language Models Under Uncertainty: A Comparative Study Using Prompt Engineering

Nehoda Lainwright
Moyat Pemberton

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The ability of artificial intelligence to understand and generate human language has transformed various applications, enhancing interactions and decision-making processes. Evaluating the fallback behaviors of language models under uncertainty introduces a novel approach to understanding and improving their performance in ambiguous or conflicting scenarios. The research focused on systematically analyzing ChatGPT and Claude through a series of carefully designed prompts to introduce different types of uncertainty, including ambiguous questions, vague instructions, conflicting information, and insufficient context. Automated scripts were employed to ensure consistency in data collection, and the responses were evaluated using metrics such as accuracy, consistency, fallback mechanisms, response length, and complexity. The results highlighted significant differences in how ChatGPT and Claude handle uncertainty, with ChatGPT demonstrating superior accuracy and stability, and a more frequent use of proactive strategies to manage ambiguous inputs. The study's findings provide valuable insights for the ongoing development and refinement of language models, emphasizing the importance of integrating advanced fallback mechanisms and adaptive response strategies to enhance their robustness and reliability.

Version published to 10.31219/osf.io/34yqj on OSF Preprints
Aug 1, 2024

Prompt Engineering for Structured Data A Comparative Evaluation of Styles and LLM Performance

This article has 3 authors:
1. Jules White
2. Ashraf Elnashar
3. Douglas Schmidt
This article has no evaluationsLatest version Jun 24, 2025
The Evaluation of Generated Responses by ChatGPT to Complex Linguistics Related Questions

This article has 1 author:
1. Hadis Habibi
This article has no evaluationsLatest version May 28, 2025
The Evaluation of Generated Responses by ChatGPT to Complex Linguistics Related Questions

This article has 1 author:
1. Hadis Habibi
This article has no evaluationsLatest version Jun 3, 2025

Listed in

Abstract

Article activity feed

Related articles

Prompt Engineering for Structured Data A Comparative Evaluation of Styles and LLM Performance

The Evaluation of Generated Responses by ChatGPT to Complex Linguistics Related Questions

The Evaluation of Generated Responses by ChatGPT to Complex Linguistics Related Questions