Unsupervised learning of haptic material properties

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

When touching the surface of an object, its spatial structure translates into a vibration on the skin. The perceptual system evolved to translate this pattern into a representation that allows to distinguish between different materials. Here, we show that perceptual haptic representation of materials emerges from efficient encoding of vibratory patterns elicited by the interaction with materials. We trained a deep neural network with unsupervised learning (Autoencoder) to reconstruct vibratory patterns elicited by human haptic exploration of different materials. The learned compressed representation (i.e., latent space) allows for classification of material categories (i.e., plastic, stone, wood, fabric, leather/wool, paper, and metal). More importantly, classification performance is higher with perceptual category labels as compared to ground truth ones, and distances between categories in the latent space resemble perceptual distances, suggesting a similar coding. Crucially, the classification performance and the similarity between the perceptual and the latent space decrease with decreasing compression level. We could further show that the temporal tuning of the emergent latent dimensions is similar to properties of human tactile receptors.

Article activity feed

  1. This manuscript is in revision at eLife

    The decision letter after peer review, sent to the authors on February 18 2021, follows.

    Summary:

    This paper is of potential importance to neuroscientists who study sensory representations and how they are learnt. It suggests that neural representations underlying human perception can be understood in terms of an optimal compression of the sensory input. While the attempt is indeed interesting, there are several shortcomings that should be addressed before this work could be considered as one that can contribute substantially to the understanding of tactile perception.

    Essential Revisions:

    1. There is a significant gap between the simulated data used here and the empirical data of material perception by touch. The vibratory signals were taken from recordings of surface exploration using a tool tip (Strese et al., 2017 ) whereas the ratings of the different materials are taken from an experiment in which participants used bare hand touch (Baumgartner et al., 2013). The difference is significant especially when material classification, and not only texture classification, is required. It is not at all clear how vibratory signals could code hardness, warmth, elasticity, friction, 3D, etc (see Baumgartner et al., 2013). The authors must provide a serious discussion about this gap and convince the reader that their simulations can indeed provide an access to the internal representations of natural haptic touch. In the same spirit, they should explain why, and demonstrate that, the pre-processing of the vibrotactile data (cutting & filtering) makes sense for natural haptic touch.

    2. The authors should provide good reasons to convince the readers that the compressed representation they found is indeed a good candidate for the biological representation. First, the nature of the AE algorithm is that it will converge to some representation in the minimal encoder dimension. Why is that a good encoding representation, and why is it a good model for the biological one? Second, the differences between the results obtained with the AE and those obtained with humans (Baumgartner et al., 2013) seem to outnumber the partial similarities found. The authors should list both differences and similarities and discuss, based on these comparisons, the probability that the coding found here is similar to the coding guiding human behavior.

    3. Notion of efficiency and compression. It was not demonstrated that the main result (figure 3A) is due to compression and efficiency of the AE. What will happen if no AE is used and the distance is measured in the raw input domain (e.g. between Fourier coefficients or principal components)? One could expect figure 4 to account for that, and also show that for a very wide AE there is some deterioration of the main result. Otherwise, the main result about correlation to perceptual data cannot be attributed to the compressive property of the AE.

    4. Biological correlates of the latent representation. On the one hand the authors claim that the AE latent representation aims to mimic a latent representation of the haptic space, which they assume to be compressed and efficient. On the other hand, they claim that the AE representation is similar to mechano-sensory representation, which is a first biological representation before any compression can take place (when hand movements are ignored, as done here). This needs to be clarified.

    5. Validity of the latent representation. The reconstruction error of the AE is large and systematic: only ~50% of the variance are explained and its high frequencies are systematically ignored. The resulting latent representation is such that classes are poorly separable (~29%) and it seems to be by far worse than the human level (around 70% in Baumgartner et al. 2013). It will be therefore interesting to see if the key result, i.e. relation between AE latent space and the perceptual distance, remains valid for a more advanced AE.