Observational Learning with Gated Information: A Lightweight RL Simulation Testbed and Latent Signal Models

Chengyuan Zhu
Jialai Wu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Observational learning can be framed as active information sampling: an agent must decide when to spend time watching another agent to reduce uncertainty about when reward will be available. We present a compact simulation testbed for this social-timing problem in which (i) a demonstrator produces brief, stochastic cue bursts, (ii) a short reward window opens after each burst ends, and (iii) the observer can only access social timing features by choosing an explicit Observe action, with imperfect cue detection and a noisy window-remaining estimate. Building on the released multi-algorithm pipeline (dueling double DQN baseline, PPO actor-critic, and a tabular Q-learning control), we add controlled manipulations of social sensing and teacher dynamics: (1) a social-blind observer with the social channel disabled, (2) an impaired tracking observer (reduced detection reliability) with reduced detection reliability and noisier timing estimates, (3) a familiarity variable that gradually improves the observer’s effective sensing with repeated exposure, and (4) teacher palatability conditions that change cue-burst (consumption) duration distributions. We compare learning speed and reward-related latent signals across these conditions by logging behavior and synthesizing candidate internal variables (TD error/RPE, observation-driven value updates, action salience, and information-gain impulses), optionally filtered into signal-like traces and analyzed with peri-event averages. The framework provides a lightweight, code-aligned template for diagnosing observational RL policies and for testing which latent variables best explain event-aligned “teaching signals” under different social-information constraints.

Version published to 10.21203/rs.3.rs-9091145/v1 on Research Square
Mar 12, 2026

State Estimation as a Feasibility Condition for Cognition under Partial Observability

This article has 1 author:
1. Ian S. Howard
This article has no evaluationsLatest version Mar 20, 2026
Large Language Models for Reinforcement Learning: A Survey of Intervention Operators and Optimization Effects

This article has 3 authors:
1. Kourosh Shahnazari
2. Seyed Moein Ayyoubzadeh
3. Mohammadali Keshtparvar
This article has no evaluationsLatest version Mar 3, 2026
Foundation Model for Biological Temporal Data Dynamics with Experimental Validation

This article has 2 authors:
1. Xiaoyu Duan
2. Vipul Periwal
This article has no evaluationsLatest version Mar 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

State Estimation as a Feasibility Condition for Cognition under Partial Observability

Large Language Models for Reinforcement Learning: A Survey of Intervention Operators and Optimization Effects

Foundation Model for Biological Temporal Data Dynamics with Experimental Validation