Low-dimensional learned feature spaces quantify individual and group differences in vocal repertoires

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Increases in the scale and complexity of behavioral data pose an increasing challenge for data analysis. A common strategy involves replacing entire behaviors with small numbers of handpicked, domain-specific features, but this approach suffers from several crucial limitations. For example, handpicked features may miss important dimensions of variability, and correlations among them complicate statistical testing. Here, by contrast, we apply the variational autoencoder (VAE), an unsupervised learning method, to learn features directly from data and quantify the vocal behavior of two model species: the laboratory mouse and the zebra finch. The VAE converges on a parsimonious representation that outperforms handpicked features on a variety of common analysis tasks, enables the measurement of moment-by-moment vocal variability on the timescale of tens of milliseconds in the zebra finch, provides strong evidence that mouse ultrasonic vocalizations do not cluster as is commonly believed, and captures the similarity of tutor and pupil birdsong with qualitatively higher fidelity than previous approaches. In all, we demonstrate the utility of modern unsupervised learning approaches to the quantification of complex and high-dimensional vocal behavior.

Article activity feed