A battery of image classification challenges reveals shared and distinct object categorization behavior across monkeys, humans, and deep networks

Han Zhang
Zhihao Zheng
Jiaqi Hu
Qiao Wang
Mengya Xu
Zhaojiayi Zhou
Zixuan Li
Gouki Okazawa

Curated by eLife

eLife Assessment

This study provides fundamental insights into the mechanisms of visual object categorization in primates through a scalable behavioral framework for assessing category learning and generalization in macaque monkeys. The evidence is compelling, based on extensive behavioral characterization, rigorous control experiments, and comprehensive comparisons with humans and computational models, although extending the model analyses to the secondary monkey experiments would further strengthen the conclusions.

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (eLife)

Abstract

Humans categorize objects at multiple levels of abstraction—animate versus inanimate, big versus small, and many other attributes. Despite its apparent challenge, the advent of deep neural networks (DNNs) has demonstrated that complex visual processing alone can support such classification without language or human-specific knowledge. This raises a natural question: to what extent can non-human primates, without language, perform such categorization? Although basic object-recognition behavior in monkeys such as similarity judgment has been extensively studied, their ability to classify objects across diverse rules remains poorly characterized. Here, we developed a task paradigm that enabled us to train monkeys on a large battery of binary classification tasks using natural object images, spanning more than 10 rules, such as animate versus inanimate, natural versus man-made objects, and mammalian versus non-mammalian animals. Monkeys acquired each rule in a few days, generalized the learned rules to new images, and exhibited error patterns consistent with human judgments. At the same time, their classification performance correlated more strongly with that of visual DNNs trained without language input, whereas human performance was better explained by language-informed DNNs. These results provide an important benchmark for the capacity of biological neural networks to perform image classification without language and human-specific knowledge.

Version published to 10.7554/elife.111725.1 on eLife
Jul 2, 2026
Version published to 10.7554/elife.111725 on eLife
Jul 2, 2026
eLife
Jul 1, 2026

eLife Assessment

This study provides fundamental insights into the mechanisms of visual object categorization in primates through a scalable behavioral framework for assessing category learning and generalization in macaque monkeys. The evidence is compelling, based on extensive behavioral characterization, rigorous control experiments, and comprehensive comparisons with humans and computational models, although extending the model analyses to the secondary monkey experiments would further strengthen the conclusions.

Read the original source
eLife
Jul 1, 2026

Reviewer #1 (Public review):

Summary:

This study presents a systematic behavioral characterization of object classification abilities in macaque monkeys using a high-throughput touchscreen-based paradigm. The work shows that monkeys can learn and generalize many binary object classification rules, and compares their behavior with humans and computational models. A key finding is that monkey behavior is more closely aligned with visual deep neural networks, whereas human behavior is better captured by language-informed models. The study provides a useful benchmark for understanding visually grounded object categorization in nonhuman primates.

Strengths:

The study introduces a scalable and well-controlled behavioral paradigm for testing many object classification rules in macaques. The comparison across monkeys, humans, and computational …

Reviewer #1 (Public review):

Summary:

This study presents a systematic behavioral characterization of object classification abilities in macaque monkeys using a high-throughput touchscreen-based paradigm. The work shows that monkeys can learn and generalize many binary object classification rules, and compares their behavior with humans and computational models. A key finding is that monkey behavior is more closely aligned with visual deep neural networks, whereas human behavior is better captured by language-informed models. The study provides a useful benchmark for understanding visually grounded object categorization in nonhuman primates.

Strengths:

The study introduces a scalable and well-controlled behavioral paradigm for testing many object classification rules in macaques. The comparison across monkeys, humans, and computational models is a major strength and makes the work broadly relevant to visual neuroscience, comparative cognition, and computational modeling. The results provide an informative framework for distinguishing categorization based primarily on visual representations from categorization supported by semantic or language-based knowledge.

Weaknesses:

Some aspects of the interpretation would benefit from clarification. In particular, it remains somewhat unclear what stimulus-level factors drive image difficulty, how much training performance reflects general rule learning versus repeated reinforcement of specific images, and whether monkeys and humans apply the same category rules. The link between macaque IT representations and monkey behavior is also suggestive but not yet fully resolved, given the limited and separate neural dataset.

Read the original source
eLife
Jul 1, 2026

Reviewer #2 (Public review):

Summary:

The paper tackles a very interesting question and provides a solid and systematic piece of data that may be useful for numerous NeuroAI works in the future. The question is how well can macaque monkeys with a "pretrained" visual system without human knowledge learn to categorize images based on different kinds of (sometimes arbitrary) category definitions. In general, I love the paper, and I think both the data and presentation of it are beautiful.

Strengths:

(1) The authors developed a scalable method for training and studying this behavior, and did an exhaustive evaluation of monkeys' behavior and learning process.

(2) Beyond the behavior result, they performed extensive analysis and control experiments to isolate the cue monkeys are using to perform the categorization.

(3) The extensive …

Reviewer #2 (Public review):

Summary:

The paper tackles a very interesting question and provides a solid and systematic piece of data that may be useful for numerous NeuroAI works in the future. The question is how well can macaque monkeys with a "pretrained" visual system without human knowledge learn to categorize images based on different kinds of (sometimes arbitrary) category definitions. In general, I love the paper, and I think both the data and presentation of it are beautiful.

Strengths:

(1) The authors developed a scalable method for training and studying this behavior, and did an exhaustive evaluation of monkeys' behavior and learning process.

(2) Beyond the behavior result, they performed extensive analysis and control experiments to isolate the cue monkeys are using to perform the categorization.

(3) The extensive comparison of behavior with deep neural networks is also super interesting.

(4) The authors performed a very careful examination of generalization behavior in monkeys, similar to standard practise in machine learning.

(5) The presentation of the data is very beautiful and deliberately designed, kudos to the authors for their efforts!

(6) I really enjoyed the further categorization task based on human knowledge, and the arbitrary rule task; this really pushes our understanding of the visual categorization and learning capability of monkeys.

(7) The examination of *learning dynamics* in human vs monkey is also quite interesting, i.e., humans can "understand the rule" and learn much faster versus monkeys learning across a few days.

Weaknesses:

(1) Though all results are pretty cool, the organization of results, figures, and sections can be modified to flow even better.

(2) Maybe provide DNN categorization and generalization results for the non-main monkey experiments (Figures 2,3), those comparisons can be really interesting too!

Read the original source
eLife
Jul 1, 2026

Author response:

We sincerely thank the editors and reviewers for their time and thoughtful feedback on our manuscript. The reviewers' constructive comments have been very helpful in guiding our revision plan. Below, we outline our plan.

In response to Reviewer #1's comments on clarifying the factors that affect image difficulty and categorization rules, we will implement several revisions. First, to clarify what drives image difficulty, we will test whether image typicality within categories, quantified using methods such as Kramer et al. (2023; Sci Adv 9.17: eadd2981), can explain monkey categorization performance. Second, we will also examine whether performance on generalization images depended on their similarity to specific repeated images and on their category typicality. Third, to address whether monkeys and humans apply similar …

Author response:

We sincerely thank the editors and reviewers for their time and thoughtful feedback on our manuscript. The reviewers' constructive comments have been very helpful in guiding our revision plan. Below, we outline our plan.

In response to Reviewer #1's comments on clarifying the factors that affect image difficulty and categorization rules, we will implement several revisions. First, to clarify what drives image difficulty, we will test whether image typicality within categories, quantified using methods such as Kramer et al. (2023; Sci Adv 9.17: eadd2981), can explain monkey categorization performance. Second, we will also examine whether performance on generalization images depended on their similarity to specific repeated images and on their category typicality. Third, to address whether monkeys and humans apply similar category rules, we will focus on images for which monkeys consistently made errors and examine whether these same images also yielded lower performance (i.e., longer reaction times) in humans.

Reviewer #1 also raised an important question about how well macaque IT representations and behavior align. The IT categorization performance estimated in our manuscript is currently lower than monkey behavior, but this may reflect the limited number of recorded neurons. We will estimate ceiling IT performance as a function of neuron count and compare it with monkey and human behavior.

In response to Reviewer #2's suggestion to enhance narrative flow, we will reorganize the text and adjust the ordering of certain figures and sections to ensure smoother transitions between findings and analyses. Specifically, we will more clearly state which parts of the manuscript establish monkeys' categorization ability and which parts compare their behavior with models or humans before performing a triangular comparison across all three.

Regarding Reviewer #2's suggestion to test DNN performance on control experiments (non-natural stimuli, arbitrary categorization), we agree this is an excellent addition. We will perform these analyses and plan to report the results in the revised manuscript.

We believe these revisions will substantially strengthen the manuscript and fully address the reviewers' feedback.

Read the original source
Version published to 10.1101/2025.04.02.646407 on bioRxiv
Apr 3, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed