Impact of analytic decisions on test-retest reliability of individual and group estimates in functional magnetic resonance imaging: a multiverse analysis using the monetary incentive delay task

Michael I. Demidenko
Jeanette A. Mumford
Russell A. Poldrack

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Empirical studies reporting low test-retest reliability of individual blood oxygen-level dependent (BOLD) signal estimates in functional magnetic resonance imaging (fMRI) data have resurrected interest among cognitive neuroscientists in methods that may improve reliability in fMRI. Over the last decade, several individual studies have reported that modeling decisions, such as smoothing, motion correction and contrast selection, may improve estimates of test-retest reliability of BOLD signal estimates. However, it remains an empirical question whether certain analytic decisions consistently improve individual and group level reliability estimates in an fMRI task across multiple large, independent samples. This study used three independent samples ( N s: 60, 81, 119) that collected the same task (Monetary Incentive Delay task) across two runs and two sessions to evaluate the effects of analytic decisions on the individual (intraclass correlation coefficient [ICC(3,1)]) and group (Jaccard/Spearman rho ) reliability estimates of BOLD activity of task fMRI data. The analytic decisions in this study vary across four categories: smoothing kernel (five options), motion correction (four options), task parameterizing (three options) and task contrasts (four options), totaling 240 different pipeline permutations. Across all 240 pipelines, the median ICC estimates are consistently low, with a maximum median ICC estimate of .43 - .55 across the three samples. The analytic decisions with the greatest impact on the median ICC and group similarity estimates are the Implicit Baseline contrast, Cue Model parameterization and a larger smoothing kernel. Using an Implicit Baseline in a contrast condition meaningfully increased group similarity and ICC estimates as compared to using the Neutral cue. This effect was largest for the Cue Model parameterization; however, improvements in reliability came at the cost of interpretability. This study illustrates that estimates of reliability in the MID task are consistently low and variable at small samples, and a higher test-retest reliability may not always improve interpretability of the estimated BOLD signal.

Version published to 10.1101/2024.03.19.585755 on bioRxiv
Mar 20, 2024
Version published to 10.1162/imag_a_00262
Jan 1, 2024

The Statistical Costs of Two-Step Signal Detection Analyses: A Case for a Maximum Likelihood Mixed-Effects Approach

This article has 4 authors:
1. Marie Jakob
2. Raphael Hartmann
3. Karl Christoph Klauer
4. Constantin Gregor Meyer-Grant
This article has no evaluationsLatest version Dec 12, 2025
Estimating the reliability of the lure discrimination index for studying brain-behavior correlations and individual differences in memory

This article has 4 authors:
1. Zsuzsanna Nemecz
2. Tamás Szűcs
3. Attila Keresztes
4. Attila Krajcsi
This article has no evaluationsLatest version Dec 16, 2025
Improving the reliability of the Pavlovian go/no-go task for computational psychiatry research

This article has 5 authors:
1. Samuel Zorowitz
2. Gili Karni
3. Natalie Paredes
4. Nathaniel Daw
5. Yael Niv
This article has no evaluationsLatest version Jan 28, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Statistical Costs of Two-Step Signal Detection Analyses: A Case for a Maximum Likelihood Mixed-Effects Approach

Estimating the reliability of the lure discrimination index for studying brain-behavior correlations and individual differences in memory

Improving the reliability of the Pavlovian go/no-go task for computational psychiatry research