Toward Zero-Human-Intervention Autonomous Robot Learning: A Continuous Result-Driven Self-Reward and Correction Framework
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Autonomous robots operating in complex real-world environments require the ability to continuously improve their behavior without human-provided reward annotation or online intervention. However, robot actions often produce delayed and multi-factor consequences, making it difficult to correctly associate later outcomes with earlier actions and to perform reliable autonomous self-reward and self-correction. In this paper, we propose a continuous result-driven self-reward and correction framework for autonomous robots. The framework enables robots to collect full-process behavioral, environmental, and internal reasoning records, perform delayed outcome discovery and temporal backtracking, generate unified internal--external self-reward signals, and revise earlier reward judgments when later evidence reveals them to be incomplete or incorrect. It also supports self-intervention for improving causal verification and policy reliability. Experimental results show that, compared with conventional baselines, the proposed framework improves delayed-outcome attribution accuracy from 15.93% to 38.36%, increases safety score from 66.1 to 98.5, and reduces persistent wrong reward rate of 48.77%.