An evolutionary systems approach to language and music: measuring predictive dynamics with viewpoint graph networks

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Humans are unique among social animals in having both music and language. A key approach to studying their evolution is identifying shared traits, like vocal learning or entrainment, across species and mapping them onto phylogenetic trees (Honing et al., 2015). Related to these traits, cross-cultural research revealed near-universal features specific for or shared by music and language (Savage et al., 2015). However, many crucial traits emerged after our split from chimpanzees, limiting this method's explanatory power. Understanding the evolutionary sequence and interaction of these traits could refine hypotheses about protolanguage and protomusic systems and their evolutionary relation. We propose adapting an evolutionary systems biology (ESB) framework aiming to identify causal interactions in evolving systems and define their possible evolutionary trajectories. Using dynamical systems modeling, this approach provides mechanistic explanations for phenotypic development and evolutionary transitions. The key domain to assess evolvability and robustness are the component dynamics of the system - in the original ESB framework gene expression dynamics linking genotype to phenotype (Jaeger & Monk, 2021). In transferring this approach, we suggest investigating the domain of predictive dynamics mediating between cognitive traits ("genotypic") and spectrotemporal features ("phenotypic"). Here we attempted an ESB implementation by investigating predictability in song, instrumental music, and recited and descriptive speech from a cross-cultural dataset (Ozaki et al., 2024). We conceptualise predictive dynamics as interface between song/speech acoustics and cognitive capacities/goals and quantified it using the information-theoretic model IDyoMS (Pearce, 2005). This model operationalises auditory perception through "viewpoints" like pitch and duration. We computed information content and entropy across viewpoints and examined the resulting networks with graph-theoretic measures. We found that mean information content increased from descriptive to recited speech to song and instrumental music, forming a continuum with the highest predictability in instrumental music. However, we also observed that both network topographies and observed predictive patterns were similar between conditions, suggesting that construction and choice of viewpoints is highly influential. We also found song and speech to differ in specific subnetworks, suggesting that the same set of perceptual dimensions may still lead to different predictive dynamics. We propose that a framework from ESB can be applied to cognition to gain empirical insight into music and language evolution. We will discuss current limitations and future directions of operationalising predictive dynamics using information-theoretic viewpoint networks to investigate the evolutionary differentiation of song and speech.

Article activity feed