Toward Zero-Human-Intervention Autonomous Robot Learning: A Continuous Result-Driven Self-Reward and Correction Framework

Hong Su

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Autonomous robots operating in complex real-world environments require the ability to continuously improve their behavior without human-provided reward annotation or online intervention. However, robot actions often produce delayed and multi-factor consequences, making it difficult to correctly associate later outcomes with earlier actions and to perform reliable autonomous self-reward and self-correction. In this paper, we propose a continuous result-driven self-reward and correction framework for autonomous robots. The framework enables robots to collect full-process behavioral, environmental, and internal reasoning records, perform delayed outcome discovery and temporal backtracking, generate unified internal--external self-reward signals, and revise earlier reward judgments when later evidence reveals them to be incomplete or incorrect. It also supports self-intervention for improving causal verification and policy reliability. Experimental results show that, compared with conventional baselines, the proposed framework improves delayed-outcome attribution accuracy from 15.93% to 38.36%, increases safety score from 66.1 to 98.5, and reduces persistent wrong reward rate of 48.77%.

Version published to 10.21203/rs.3.rs-9398282/v1 on Research Square
Apr 14, 2026

Toward Zero Fixed Code: Complete-Generation-Information-Driven Self-Learning for Autonomous Robots via Multi-Layer Code Replaceability and Safe Rollback

This article has 1 author:
1. Hong Su
This article has no evaluationsLatest version Apr 9, 2026
Trust Guided Reinforcement Learning for Safe Robot Navigation with Dynamic Window Approach

This article has 4 authors:
1. Yuhan Wang
2. Haonan Li
3. Hu Luo
4. Gebel Elena Sergeevna
This article has no evaluationsLatest version Apr 17, 2026
When anticipation is not enough: a mixture of robust and adaptive feedback control strategies improve reaching in dynamic environments

This article has 2 authors:
1. Hari Teja Kalidindi
2. Frédéric Crevecoeur
This article has no evaluationsLatest version Apr 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Toward Zero Fixed Code: Complete-Generation-Information-Driven Self-Learning for Autonomous Robots via Multi-Layer Code Replaceability and Safe Rollback

Trust Guided Reinforcement Learning for Safe Robot Navigation with Dynamic Window Approach

When anticipation is not enough: a mixture of robust and adaptive feedback control strategies improve reaching in dynamic environments