MetaThink: Empowering Large Reasoning Models with Adaptive Self-Correction at Inference Time

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large Reasoning Models (LRMs) face a fundamental challenge in balancing efficient "fast thinking" with accurate "slow thinking," often struggling to adaptively trigger deeper reasoning without incurring significant computational overhead. This paper introduces \( \textit{MetaThink (MT)} \), a novel inference-time adaptive refinement framework designed to imbue LRMs with conditional self-correction capabilities, without requiring any additional training. \( \textit{MetaThink} \) operates by an initial "fast thinking" phase, followed by a lightweight self-monitoring mechanism that assesses confidence through uncertainty markers. When low confidence or potential errors are detected, a refinement token triggers a targeted "slow thinking" phase, guided by domain-specific prompts. This allows the model to introspectively review and correct its reasoning, culminating in a more accurate final answer. Our comprehensive evaluation across diverse and challenging benchmarks—spanning mathematical reasoning, code generation, and scientific problem-solving tasks—demonstrates that \( \textit{MetaThink} \) consistently achieves substantial and robust improvements in Pass@1 accuracy. Crucially, these gains are realized while maintaining competitive or even improved inference efficiency, outperforming existing inference-time baselines. Our findings underscore that \( \textit{MetaThink} \) offers an effective, training-free approach to enhance the reliability and accuracy of LRMs in complex reasoning tasks by striking a superior balance between performance and efficiency.

Article activity feed