THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior

Martin N Hebart
Oliver Contier
Lina Teichmann
Adam H Rockter
Charles Y Zheng
Alexis Kidder
Anna Corriveau
Maryam Vaziri-Pashkam
Chris I Baker

Curated by eLife

eLife assessment

Hebart et al., present a new massive multi-model dataset to support the study of visual object representation, including data measured from functional magnetic resonance imaging, magnetoencephalography, and behavioral similarity judgments. The general, condition-rich design, conducted over a thoughtfully curated and sampled set of object concepts will be highly valuable to the cognitive/computational/neuroscience community, yielding data that will be amenable to many empirical questions beyond the field of visual object recognition. The dataset is accompanied by quality control evaluations as well as examples of analyses that the community can re-run and further explore for building new hypotheses that can be tested with such a rich dataset.

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (eLife)

Abstract

Understanding object representations requires a broad, comprehensive sampling of the objects in our visual world with dense measurements of brain activity and behavior. Here, we present THINGS-data, a multimodal collection of large-scale neuroimaging and behavioral datasets in humans, comprising densely sampled functional MRI and magnetoencephalographic recordings, as well as 4.70 million similarity judgments in response to thousands of photographic images for up to 1,854 object concepts. THINGS-data is unique in its breadth of richly annotated objects, allowing for testing countless hypotheses at scale while assessing the reproducibility of previous findings. Beyond the unique insights promised by each individual dataset, the multimodality of THINGS-data allows combining datasets for a much broader view into object processing than previously possible. Our analyses demonstrate the high quality of the datasets and provide five examples of hypothesis-driven and data-driven applications. THINGS-data constitutes the core public release of the THINGS initiative ( https://things-initiative.org ) for bridging the gap between disciplines and the advancement of cognitive neuroscience.

Version published to 10.7554/elife.82580 on eLife
Feb 27, 2023
eLife
Jan 19, 2023

Author Response

Reviewer #1 (Public Review):

The authors develop and freely disseminate the THINGS-data collection, a large-scale dataset incorporating MRI, MEG, eye-tracking, and 4.7 million similarity ratings for 1,854 object concepts. Demonstrating the reliability of their data, the authors replicate nearly a dozen previous neuroimaging papers. This "big data" approach significantly advances our ability to link behavioral measures with neuroimaging at scale, with the potential to spark future insights into how the mind represents objects.

I thought that the article was well-written, with a sound methodological approach, high-quality results, and well-supported conclusions. I am overall enthusiastic about this work, and I think THINGS will provide an important benchmark for future big data approaches in cognitive and …

Author Response

Reviewer #1 (Public Review):

The authors develop and freely disseminate the THINGS-data collection, a large-scale dataset incorporating MRI, MEG, eye-tracking, and 4.7 million similarity ratings for 1,854 object concepts. Demonstrating the reliability of their data, the authors replicate nearly a dozen previous neuroimaging papers. This "big data" approach significantly advances our ability to link behavioral measures with neuroimaging at scale, with the potential to spark future insights into how the mind represents objects.

I thought that the article was well-written, with a sound methodological approach, high-quality results, and well-supported conclusions. I am overall enthusiastic about this work, and I think THINGS will provide an important benchmark for future big data approaches in cognitive and computational neuroscience.

However, I thought it was also important to articulate more directly the potential insights this dataset can offer to the field. Although the authors mentioned that they "provided five examples for potential research directions", it was not clear to me what these new research directions were, given that the authors entirely describe replications in the results.

We thank Reviewer 1 for their positive evaluation and the enthusiasm for our work! We have revised the manuscript to articulate more clearly and directly some potential research directions for the dataset. There are two aspects to consider: What sets these datasets apart from traditional small-scale research? And what sets them apart from other large-scale research? We elaborate on these two aspects in response to specific comments below.

Reviewer #2 (Public Review):

Hebart et al., present a large-scale multi-model dataset consisting of fMRI, EEG, and behavioral similarity measures towards the study of object representation in the mind and brain. The effort is immense, the methods are rigorous, and the data are of reasonable quality, the demonstrative analyses are extensive and provocative. (One small note regarding one leg of this multi-modal dataset is that the fMRI design consisted of a single image presentation for 0.5s without repetitions for most of the images; this design choice has particular analysis implications, e.g. the dataset will have more power when leveraging a priori grouping of images. However, unlike other datasets of this kind, here the number of images and how they were selected does support this analysis mode, e.g. multiple exemplars per object concept, and rich accompanying meta-data and behavioral data.)

The manuscript is well-written, and the THINGs website that lets you explore the datasets is easy to navigate, delivering on the promise of making this an integrated, expanding worldwide initiative. Further, the datasets have clear complementary strengths to recent other large-scale datasets, in terms of the ways that the images were sampled (not to mention being multi-modal)-thus I suspect that the THINGs dataset will be heavily used by the cognitive/computational/neuroscience research community going forward.

We would like to thank the reviewer for their positive evaluation of our work! We agree that the dataset has more power when leveraging a priori grouping of images, which is specifically the design choice we made here. We also agree that we can better highlight the strength of our dataset with respect to existing datasets regarding multiple exemplars per object concept and the semantic breadth of the included object categories.

Reviewer #3 (Public Review):

This manuscript presents a highly valuable dataset with multimodal functional human brain imaging data (fMRI and MEG) as well as behavioural annotations of the stimuli used (thousands of images from the THINGS collection, systematically covering multiple types of concrete nameable objects).

The manuscript presents details about the dataset, quality control measures, and a careful description of preprocessing choices. The tools and approaches that were used follow the state of the art of the field in human functional brain imaging and I praise the authors for being transparent in their methodological approaches by also sharing their code along with the data. The manuscript also presents a few analyses with the data: 1) multi-dimensional embedding of perceived similarity judgments 2) decoding of neural representations of objects both with fMRI and MEG 3) A replication of findings related to visual size and animacy of objects 4) representation similarity analysis between functional brain data and behavioural ratings 5) MEG-fMRI fusion.

We thank the reviewer for their overall positive assessment of our work!

Read the original source
eLife
Nov 14, 2022

eLife assessment

Hebart et al., present a new massive multi-model dataset to support the study of visual object representation, including data measured from functional magnetic resonance imaging, magnetoencephalography, and behavioral similarity judgments. The general, condition-rich design, conducted over a thoughtfully curated and sampled set of object concepts will be highly valuable to the cognitive/computational/neuroscience community, yielding data that will be amenable to many empirical questions beyond the field of visual object recognition. The dataset is accompanied by quality control evaluations as well as examples of analyses that the community can re-run and further explore for building new hypotheses that can be tested with such a rich dataset.

Read the original source
eLife
Nov 14, 2022

Reviewer #1 (Public Review):

The authors develop and freely disseminate the THINGS-data collection, a large-scale dataset incorporating MRI, MEG, eye-tracking, and 4.7 million similarity ratings for 1,854 object concepts. Demonstrating the reliability of their data, the authors replicate nearly a dozen previous neuroimaging papers. This "big data" approach significantly advances our ability to link behavioral measures with neuroimaging at scale, with the potential to spark future insights into how the mind represents objects.

I thought that the article was well-written, with a sound methodological approach, high-quality results, and well-supported conclusions. I am overall enthusiastic about this work, and I think THINGS will provide an important benchmark for future big data approaches in cognitive and computational neuroscience.

Howeve…

Reviewer #1 (Public Review):

The authors develop and freely disseminate the THINGS-data collection, a large-scale dataset incorporating MRI, MEG, eye-tracking, and 4.7 million similarity ratings for 1,854 object concepts. Demonstrating the reliability of their data, the authors replicate nearly a dozen previous neuroimaging papers. This "big data" approach significantly advances our ability to link behavioral measures with neuroimaging at scale, with the potential to spark future insights into how the mind represents objects.

I thought that the article was well-written, with a sound methodological approach, high-quality results, and well-supported conclusions. I am overall enthusiastic about this work, and I think THINGS will provide an important benchmark for future big data approaches in cognitive and computational neuroscience.

However, I thought it was also important to articulate more directly the potential insights this dataset can offer to the field. Although the authors mentioned that they "provided five examples for potential research directions", it was not clear to me what these new research directions were, given that the authors entirely describe replications in the results.

Read the original source
eLife
Nov 14, 2022

Reviewer #2 (Public Review):

Hebart et al., present a large-scale multi-model dataset consisting of fMRI, EEG, and behavioral similarity measures towards the study of object representation in the mind and brain. The effort is immense, the methods are rigorous, and the data are of reasonable quality, the demonstrative analyses are extensive and provocative. (One small note regarding one leg of this multi-modal dataset is that the fMRI design consisted of a single image presentation for 0.5s without repetitions for most of the images; this design choice has particular analysis implications, e.g. the dataset will have more power when leveraging a priori grouping of images. However, unlike other datasets of this kind, here the number of images and how they were selected does support this analysis mode, e.g. multiple exemplars per object …

Reviewer #2 (Public Review):

Hebart et al., present a large-scale multi-model dataset consisting of fMRI, EEG, and behavioral similarity measures towards the study of object representation in the mind and brain. The effort is immense, the methods are rigorous, and the data are of reasonable quality, the demonstrative analyses are extensive and provocative. (One small note regarding one leg of this multi-modal dataset is that the fMRI design consisted of a single image presentation for 0.5s without repetitions for most of the images; this design choice has particular analysis implications, e.g. the dataset will have more power when leveraging a priori grouping of images. However, unlike other datasets of this kind, here the number of images and how they were selected does support this analysis mode, e.g. multiple exemplars per object concept, and rich accompanying meta-data and behavioral data.)

The manuscript is well-written, and the THINGs website that lets you explore the datasets is easy to navigate, delivering on the promise of making this an integrated, expanding worldwide initiative. Further, the datasets have clear complementary strengths to recent other large-scale datasets, in terms of the ways that the images were sampled (not to mention being multi-modal)-thus I suspect that the THINGs dataset will be heavily used by the cognitive/computational/neuroscience research community going forward.

Read the original source
eLife
Nov 14, 2022

Reviewer #3 (Public Review):

This manuscript presents a highly valuable dataset with multimodal functional human brain imaging data (fMRI and MEG) as well as behavioural annotations of the stimuli used (thousands of images from the THINGS collection, systematically covering multiple types of concrete nameable objects).

The manuscript presents details about the dataset, quality control measures, and a careful description of preprocessing choices. The tools and approaches that were used follow the state of the art of the field in human functional brain imaging and I praise the authors for being transparent in their methodological approaches by also sharing their code along with the data. The manuscript also presents a few analyses with the data: 1) multi-dimensional embedding of perceived similarity judgments 2) decoding of neural …

Reviewer #3 (Public Review):

This manuscript presents a highly valuable dataset with multimodal functional human brain imaging data (fMRI and MEG) as well as behavioural annotations of the stimuli used (thousands of images from the THINGS collection, systematically covering multiple types of concrete nameable objects).

The manuscript presents details about the dataset, quality control measures, and a careful description of preprocessing choices. The tools and approaches that were used follow the state of the art of the field in human functional brain imaging and I praise the authors for being transparent in their methodological approaches by also sharing their code along with the data. The manuscript also presents a few analyses with the data: 1) multi-dimensional embedding of perceived similarity judgments 2) decoding of neural representations of objects both with fMRI and MEG 3) A replication of findings related to visual size and animacy of objects 4) representation similarity analysis between functional brain data and behavioural ratings 5) MEG-fMRI fusion.

Read the original source
Version published to 10.1101/2022.07.22.501123 on bioRxiv
Jul 23, 2022

GenBrain: A Generative Foundation Model of Multimodal Brain Imaging

This article has 5 authors:
1. Weikang Gong
2. Chang Yang
3. Jianfeng Feng
4. Christian Beckmann
5. Stephen Smith
This article has no evaluationsLatest version Jan 21, 2026
A Multidimensional-Scaling Study of Images from Diverse Everyday-Object Categories

This article has 2 authors:
1. Robert Nosofsky
2. Adam Osth
This article has no evaluationsLatest version Dec 22, 2025
Annotation-free whole-brain neuron tracking via transformer-based self-supervised learning

This article has 3 authors:
1. Hang Lu
2. Haejun Han
3. Robert Pritchard
This article has no evaluationsLatest version Jan 21, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

GenBrain: A Generative Foundation Model of Multimodal Brain Imaging

A Multidimensional-Scaling Study of Images from Diverse Everyday-Object Categories

Annotation-free whole-brain neuron tracking via transformer-based self-supervised learning