What Are You Still Waiting For? Real-Time Lexical Access Requires Encapsulated Auditory Memory

Hyoju Kim
John Baltazar Muegge
Bob McMurray

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Speech perception entails the integration of multiple acoustic cues that unfold asynchronously. While conventional work posits that listeners immediately update higher-level interpretations as cues become available, emerging evidence suggests that processing of certain speech sounds (e.g., voiceless sibilant fricatives) may be buffered, in which categorical judgments are temporarily withheld until the arrival of subsequent cues (such as a following vocoid). This study examined the mechanisms underlying this buffered integration, testing whether its release is triggered by specific incoming information or by a fixed temporal threshold. Additionally, we examine potential sources of this buffering effect. Using the Visual World Paradigm (VWP), we tested this by manipulating frication duration for /s/ and /ʃ/ to canonical, doubled, and tripled lengths. Participants (n=56) completed a VWP task to determine when recognition of a fricative occurred, and a Visual Analog Scaling (VAS) task to assess categorization profiles and determine whether these profiles predict individual differences in buffering. The VWP results replicated prior findings: for canonical-length fricatives, listeners delayed processing until frication offset. For double-length fricatives, this buffering was extended, with listeners withholding decisions until the entire fricative duration had elapsed. When fricative lengths were tripled, partial commitment emerged during the frication period, yet final decisions remained contingent on later cues. These results suggest that buffering is largely governed by the availability of acoustic information over time, though listeners may integrate partial information at extreme durations. VAS results revealed that listeners with greater sensitivity to secondary cues showed faster recognition. Together, these findings underscore the flexible nature of speech perception, highlighting how listeners strategically modulate processing to optimize cue integration.

Version published to 10.31234/osf.io/anv8b_v1 on OSF Preprints
Sep 11, 2025

Statistical learning in the auditory modality: Identities are all you need

This article has 4 authors:
1. Ananya Mandal
2. Thomas Geyer
3. Heinrich R Liesefeld
4. Artyom Zinchenko
This article has no evaluationsLatest version Sep 24, 2025
Salient features of task-irrelevant continuous speech distort subjective time

This article has 3 authors:
1. Ashley E Symons
2. Frederic Dick
3. Adam Tierney
This article has no evaluationsLatest version Oct 16, 2025
Lexical Knowledge Enhances Consistency in Speech Categorization

This article has 4 authors:
1. Sita Carraturo
2. Hyoju Kim
3. Ethan Kutlu
4. Bob McMurray
This article has no evaluationsLatest version Oct 2, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Statistical learning in the auditory modality: Identities are all you need

Salient features of task-irrelevant continuous speech distort subjective time

Lexical Knowledge Enhances Consistency in Speech Categorization