Toward Zero-Human-Intervention Autonomous Robot Learning: A Continuous Result-Driven Self-Reward and Correction Framework

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Autonomous robots operating in complex real-world environments require the ability to continuously improve their behavior without human-provided reward annotation or online intervention. However, robot actions often produce delayed and multi-factor consequences, making it difficult to correctly associate later outcomes with earlier actions and to perform reliable autonomous self-reward and self-correction. In this paper, we propose a continuous result-driven self-reward and correction framework for autonomous robots. The framework enables robots to collect full-process behavioral, environmental, and internal reasoning records, perform delayed outcome discovery and temporal backtracking, generate unified internal--external self-reward signals, and revise earlier reward judgments when later evidence reveals them to be incomplete or incorrect. It also supports self-intervention for improving causal verification and policy reliability. Experimental results show that, compared with conventional baselines, the proposed framework improves delayed-outcome attribution accuracy from 15.93% to 38.36%, increases safety score from 66.1 to 98.5, and reduces persistent wrong reward rate of 48.77%.

Article activity feed