Robust group- but limited individual-level (longitudinal) reliability and insights into cross-phases response prediction of conditioned fear

Maren Klingelhöfer-Jens
Mana R Ehlers
Manuel Kuhn
Vincent Keyaniyan
Tina B Lonsdorf

Curated by eLife

Evaluation Summary

The authors comprehensively assess the measurement properties of behavioral (skin conductance and ratings) and fMRI measures of fear conditioning (acquisition and extinction) in a sample of 107 participants, with 71 providing retest measures at 6 months. Retest reliability was generally low, whereas internal-consistency reliability was generally high. At the group level, reliability and criterion validity were generally good. Most measurements proved sensitive to modality, processing, or statistical decisions. Results are framed within a larger discussion of the role of measurement properties in individual difference research and clinical translation and will serve as an important building block towards improvement in both these areas.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (eLife)

Abstract

Here, we follow the call to target measurement reliability as a key prerequisite for individual-level predictions in translational neuroscience by investigating (1) longitudinal reliability at the individual and (2) group level, (3) internal consistency and (4) response predictability across experimental phases. One hundred and twenty individuals performed a fear conditioning paradigm twice 6 months apart. Analyses of skin conductance responses, fear ratings and blood oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) with different data transformations and included numbers of trials were conducted. While longitudinal reliability was rather limited at the individual level, it was comparatively higher for acquisition but not extinction at the group level. Internal consistency was satisfactory. Higher responding in preceding phases predicted higher responding in subsequent experimental phases at a weak to moderate level depending on data specifications. In sum, the results suggest that while individual-level predictions are meaningful for (very) short time frames, they also call for more attention to measurement properties in the field.

Version published to 10.7554/elife.78717 on eLife
Sep 13, 2022
eLife
May 19, 2022

Evaluation Summary

The authors comprehensively assess the measurement properties of behavioral (skin conductance and ratings) and fMRI measures of fear conditioning (acquisition and extinction) in a sample of 107 participants, with 71 providing retest measures at 6 months. Retest reliability was generally low, whereas internal-consistency reliability was generally high. At the group level, reliability and criterion validity were generally good. Most measurements proved sensitive to modality, processing, or statistical decisions. Results are framed within a larger discussion of the role of measurement properties in individual difference research and clinical translation and will serve as an important building block towards improvement in both these areas.

(This preprint has been reviewed by eLife. We include the public reviews from the …

Evaluation Summary

The authors comprehensively assess the measurement properties of behavioral (skin conductance and ratings) and fMRI measures of fear conditioning (acquisition and extinction) in a sample of 107 participants, with 71 providing retest measures at 6 months. Retest reliability was generally low, whereas internal-consistency reliability was generally high. At the group level, reliability and criterion validity were generally good. Most measurements proved sensitive to modality, processing, or statistical decisions. Results are framed within a larger discussion of the role of measurement properties in individual difference research and clinical translation and will serve as an important building block towards improvement in both these areas.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

Read the original source
eLife
May 19, 2022

Reviewer #1 (Public Review):

The authors comprehensively assess the measurement properties (reliability, criterion/predictive validity) of behavioral and neural measures of fear acquisition/extinction, with a focus on longitudinal reliability and consequences of analytic and processing choices via a multiverse approach. In a longitudinal design (6-mo interval), the authors collected fear acquisition and extinction measures (SCR, ratings, fMRI) at two time points in a relatively larger sample for this type of work. Most notably, test-retest reliability, which is identified as a key component in individual-difference and clinical translation work, was generally low, whereas internal consistency was generally high. Group-level (averaged) reliability and cross-phase prediction (i.e., criterion validity) were generally good. Most measurement …

Reviewer #1 (Public Review):

The authors comprehensively assess the measurement properties (reliability, criterion/predictive validity) of behavioral and neural measures of fear acquisition/extinction, with a focus on longitudinal reliability and consequences of analytic and processing choices via a multiverse approach. In a longitudinal design (6-mo interval), the authors collected fear acquisition and extinction measures (SCR, ratings, fMRI) at two time points in a relatively larger sample for this type of work. Most notably, test-retest reliability, which is identified as a key component in individual-difference and clinical translation work, was generally low, whereas internal consistency was generally high. Group-level (averaged) reliability and cross-phase prediction (i.e., criterion validity) were generally good. Most measurement indices varied as a function of modality, processing, or statistical decisions. This work is framed within a larger discussion of the role of measurement properties in individual difference work and clinical translation and will serve as an important building block towards improvement in both these areas.

The conclusions of this work are largely supported by the data and methodological approach, and this is a good benchmark for the field. However, some aspects could be clearer or streamlined, and some analytic choices are relative weaknesses.

Strengths:

The overall approach is excellent and represents the vanguard of open science practices (preregistration, all materials freely available, documentation of analysis deviations, multiverse analyses, etc.). Relatedly, this comprehensive approach reveals how different analytic choices/researcher degrees of freedom can have sometimes drastic effects on fundamental measurement properties. I think this underlines what I view as the key contribution of this manuscript: empirically highlighting the need for the fear conditioning field to pay more attention to measurement properties.

Going beyond standard associative measures of reliability (ICCs) is an important contribution of this work, as they allow the authors to comment on nuances of individual-difference reliability that are not possible with the coarser ICCs. In turn, this facilitates researchers in making more informed decisions regarding the design of fear conditioning tasks to assess individual differences.

The fMRI results are a particular strength, as fMRI continues to be a common fear conditioning index, yet its measurement properties within these studies are critically understudied. The choice to use standard ICCs in conjunction with similarity approaches is particularly fruitful here, as in conjunction with overlap metrics we now have a much better appraisal of the different components of reliability in fMRI data - and potential explanations for differences between behavioral and fMRI reliabilities.

Weaknesses:

The authors structure their effort around the premise that reliability is essential in conducting solid individual-differences science, which I agree with wholeheartedly. However, I think the authors rely on relatively arbitrary cut-offs for classifying reliability as good/poor/etc to an extent that is not warranted, particularly in the context of the Discussion, and it takes away from the impact of this effort. As the authors point out, these categorical cut-offs are more guidelines than strict rules, yet the manuscript is structured around the premise that individual-level reliability is problematically poor. Many cut-off recommendations are based on psychometric work on trait self-report measures that usually assume fewer determinants/sources of error than would be seen in neuroscience experiments, which in turn allows for larger ceilings for effect sizes and reliability. The current manuscript does not address this issue and what meaningful (as opposed to good) fear conditioning reliability is when moving away from the categorical cut-offs. In other words, is it possible that the authors actually observed "good" reliability in the context of fear conditioning work, and that this reliability is lower than other types of paradigms is just inherent to the construct being studied?

The internal consistency (cross-sectional reliability) calculation used is not well-justified, and potentially needs additional parameters. It is not clear why the authors deviate from the internal consistency calculation described in Parson, Kruijt, and Fox et al., 2019, especially given that these procedures are used for other metrics elsewhere in the manuscript.

In fMRI analyses, the authors use an ROI approach based on prior studies of fear acquisition and extinction. The majority of the most consistently identified regions (as seen in meta-analyses, Fullana et al., 2016, 2018) are analyzed. However, it is not clear why other regions are omitted, particularly given meta-analytic evidence. Striatal regions and the thalamus are the most notable omissions. Further, a weakness is that functional ROIs in this study were based on peak coordinates from a handful of prior studies, instead of meta-analytically identified coordinates. As such, I do not think the authors present the strongest foundation for making conclusions about the reliability of fear conditioning fMRI Data.

Read the original source
eLife
May 19, 2022

Reviewer #2 (Public Review):

The manuscript describes a large set of statistical analyses on fear conditioning data from 107 participants (N=71 at two time points six months apart). The analyses comprise approaches to determine the reliability and predictability of conditioned fear responses: skin conductance, ratings, and fMRI data.

The approach is thorough, with a range of analysis approaches, including within- and between-subjects similarity, the individual-level overlap of fMRI results, intraclass correlation coefficients, and cross-sectional reliability. It is important to determine these values so that researchers can discard incorrect assumptions, such as the belief that threat responses at baseline can be predictive of treatment responses in patient populations.

The poor reliability identified by several of these approaches is …

Reviewer #2 (Public Review):

The manuscript describes a large set of statistical analyses on fear conditioning data from 107 participants (N=71 at two time points six months apart). The analyses comprise approaches to determine the reliability and predictability of conditioned fear responses: skin conductance, ratings, and fMRI data.

The approach is thorough, with a range of analysis approaches, including within- and between-subjects similarity, the individual-level overlap of fMRI results, intraclass correlation coefficients, and cross-sectional reliability. It is important to determine these values so that researchers can discard incorrect assumptions, such as the belief that threat responses at baseline can be predictive of treatment responses in patient populations.

The poor reliability identified by several of these approaches is likely to be of great importance to this large, translational field. A positive result was good reliability at the group level for fear learning, but not extinction.

Read the original source
Version published to 10.1101/2022.03.15.484434v1 on bioRxiv
Mar 18, 2022

Differential contributions of low-frequency phase and power in crossmodal temporal prediction: A MEG study

This article has 4 authors:
1. Rebecca Burke
2. Jonathan Daume
3. Till R. Schneider
4. Andreas K. Engel
This article has no evaluationsLatest version May 29, 2025
cpCST: A New Continuous Performance Test for High-Precision Assessment of Attention Across the Lifespan

This article has 7 authors:
1. MacKay-Brandt
2. Garcia-Barnett
3. Gan
4. Ripley
5. Gazes
6. Milham
7. Colcombe
This article has no evaluationsLatest version Jun 25, 2025
RS-fMRI Evidence for Differential Within- and Between-Module Neural Interactions Across Age

This article has 1 author:
1. Tien-Wen Lee
This article has no evaluationsLatest version Jul 10, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Related articles

Differential contributions of low-frequency phase and power in crossmodal temporal prediction: A MEG study

cpCST: A New Continuous Performance Test for High-Precision Assessment of Attention Across the Lifespan

RS-fMRI Evidence for Differential Within- and Between-Module Neural Interactions Across Age