1. Reviewer #2 (Public Review):

    This study applies an unsupervised learning approach for assessing acoustic similarity and for classifying animal vocalizations. Investigation focuses on mice vocalization and song learning in zebra finches. The method demonstrate an impressive capacity to map and compare vocal sounds in both species and to assess vocal learning. It has clear advantages upon existing methods. It is still an open question to what extent this approach can successfully capture vocal development during early stages of song learning. In particular, the learned latent features have no simple interpretation in production and perception of vocal sounds, which future studies will need to address.

    Read the original source
    Was this evaluation helpful?
  2. Reviewer #1 (Public Review):

    In this paper, the authors addressed the reviewers' concerns and expanded extensively on the utility of variational autoencoder (VAE). The authors included an extra section discussing VAE 's capability in handling more complicated scenarios by studying the tutor and pupil song learning experiment. One can readily visualize the differences between tutor and pupil syllables via the latent embeddings. Although the latent features could be hard to interpret, one could view it as an initial exploratory analysis in identifying possible acoustic structure discrepancies. The authors also included additional data benchmarking latent features against conventional acoustics features for classification tasks and offered a more in-depth study comparing the clustering of song syllables using traditional acoustic features and VAE latent features. Moreover, they discussed the effect of time stretch and frequency spacing parameters on SAP features prediction and VAE's replicability issue for completeness.

    The new Figure 7 showing tutor-pupil analyses is a welcome addition to the paper.

    While it remains uncertain if this method will actually supersede others in quantifying finch and/or mouse datasets, this paper could, at minimum, provide a case study of advantages and disadvantages for using the VAE approach for vocalization datasets.

    Read the original source
    Was this evaluation helpful?
  3. Evaluation Summary:

    The revised MS is much improved and it addresses most of my concerns. The new direct comparison with SAP, MUPET & DeepSq clearly demonstrates the advantages of the latent learned acoustic features. The similarity measurement of zebra finch songs seem to work also in more difficult cases of fused syllables. However there are still a few places (particularly in fig. 7) where the presentation is not clear enough. Also, I wish authors had included some demonstration of song development analysis.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

    Read the original source
    Was this evaluation helpful?