Reasoning Distillation by Prompt Optimization
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large Language Models (LLMs) increasingly rely on explicit chains of thoughts (CoT) reasoning to produce reliable and interpretable outputs. However, transferring such reasoning behaviour from one system to another typically requires expensive dataset construction, full-model retraining, or large task-specific corpora used for distillation. In this work, we introduce a lightweight prompt-level distillation framework that aligns a student model with the reasoning patterns of an external 'parent' source ? which may be a larger model or a human expert. Instead of constructing a curated supervision dataset, our method operates solely on reasoning traces generated by the parent. These traces serve as optimization targets for an automated prompt-search procedure that improves the logical consistency and step-wise reasoning of the student without modifying its parameters. Across multiple reasoning benchmarks, we show that prompt-level distillation substantially narrows the performance gap between student and parent models while eliminating the cost of dataset preparation and model training. This approach provides a practical pathway for disseminating high-quality reasoning behaviours in settings where computational resources, data availability, or human labor are limited.