How Much Variance Does Your Model Explain? A Clarifying Note on the Use of Split-Half Reliability for Computing Noise Ceilings

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Noise ceilings estimated from a dataset's split-half reliability offer a powerful way to quantify how much variance a model can in principle explain given the noise in the dataset, allowing researchers to assess model performance relative to an upper bound. In this work, we caution against a common pitfall in this approach to estimating noise ceilings. Specifically, even though the split-half reliability is expressed as a correlation coefficient, it reflects the maximum explained variance of a perfect model, not the maximum correlation. This subtle misinterpretation leads to artificially lower noise ceilings and, as a consequence, may inflate how close models appear to be to the noise ceiling. A systematic literature analysis suggests that this overly permissive ceiling is the most prevalent interpretation of noise ceilings estimated through split-half reliability. The purpose of this work is to explain when the mistake happens, why it happens, what its consequences are, and how to avoid it. Toward this end, we offer a general explanation showing how split-half reliabilities relate to the performance of a maximally predictive model, supplemented by simulations, and mathematical derivations. Overall, this clarifying piece is meant to help researchers better understand the statistical underpinnings of noise ceilings and support more consistent reporting across studies.

Article activity feed