Condensed Reasoning Prompting: Efficient Strategies, Evaluations, and Trade Offs in Large Language Model Reasoning

Gautam Mehra
Danish Khan

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Recent advancements in large language models (LLMs) have demonstrated that explicitly prompting for intermediate reasoning steps significantly improves performance in complex tasks. Traditional chain of thought (CoT) prompting, however, can result in verbose outputs that increase both latency and computational cost. Condensed Reasoning Prompting (CRP) addresses this trade-off by encouraging more concise reasoning traces while maintaining high accuracy. In this paper, we systematically evaluate three prompting strategies: Chain Of Thought (CoT), Chain of Draft (CoD) and Condensed Reasoning across multiple datasets, including MMLU, Big Bench, arithmetic, and symbolic reasoning tasks. We report accuracy, average tokens per question, and a token effectiveness metric (accuracy divided by token count). Our experiments are conducted in a zero-shot setting, without specific system instructions to "skip reasoning," providing a more realistic assessment of model capabilities. Results indicate that condensed prompts often match or exceed chain of thought accuracy while reducing token usage, thus offering significant gains in efficiency. We discuss the implications for real-world deployments, highlighting how CRP can enable more efficient LLM applications without compromising performance.

Version published to 10.21203/rs.3.rs-6170708/v1 on Research Square
Mar 7, 2025

CSCE: Boosting LLM Reasoning by Simultaneous Enhancing of Causal Significance and Consistency

This article has 5 authors:
1. Kangsheng Wang
2. Xiao Zhang
3. Juntao Lyu
4. Tianyu Hu
5. Huimin Ma
This article has no evaluationsLatest version Mar 25, 2025
Exploring Explainability in Large Language Models

This article has 3 authors:
1. Fen Yin
2. Mu Zhong
3. Zhihao Ru
This article has no evaluationsLatest version Mar 31, 2025
Optimizing Large Language Models for Efficiency: A Dual-Model Architecture with Dynamic Vocabulary Adjustment

This article has 1 author:
1. Tom Vatland
This article has no evaluationsLatest version Mar 21, 2025

Listed in

Abstract

Article activity feed

Related articles

CSCE: Boosting LLM Reasoning by Simultaneous Enhancing of Causal Significance and Consistency

Exploring Explainability in Large Language Models

Optimizing Large Language Models for Efficiency: A Dual-Model Architecture with Dynamic Vocabulary Adjustment