Observational Learning with Gated Information: A Lightweight RL Simulation Testbed and Latent Signal Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Observational learning can be framed as active information sampling: an agent must decide when to spend time watching another agent to reduce uncertainty about when reward will be available. We present a compact simulation testbed for this social-timing problem in which (i) a demonstrator produces brief, stochastic cue bursts, (ii) a short reward window opens after each burst ends, and (iii) the observer can only access social timing features by choosing an explicit Observe action, with imperfect cue detection and a noisy window-remaining estimate. Building on the released multi-algorithm pipeline (dueling double DQN baseline, PPO actor-critic, and a tabular Q-learning control), we add controlled manipulations of social sensing and teacher dynamics: (1) a social-blind observer with the social channel disabled, (2) an impaired tracking observer (reduced detection reliability) with reduced detection reliability and noisier timing estimates, (3) a familiarity variable that gradually improves the observer’s effective sensing with repeated exposure, and (4) teacher palatability conditions that change cue-burst (consumption) duration distributions. We compare learning speed and reward-related latent signals across these conditions by logging behavior and synthesizing candidate internal variables (TD error/RPE, observation-driven value updates, action salience, and information-gain impulses), optionally filtered into signal-like traces and analyzed with peri-event averages. The framework provides a lightweight, code-aligned template for diagnosing observational RL policies and for testing which latent variables best explain event-aligned “teaching signals” under different social-information constraints.

Article activity feed