Scientific reasoning driven by influential data: resuscitate dfstat !

Andrej-Nikolai Spiess
Stefan Rödiger
Matthias Schaks
Michał Burdukiewicz
Joel Tellinghuisen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In biomedical literature, one of the most widely employed statistical procedures to analyze and visualize the association between two variables is linear regression. Data points that exert influence on the fit and its parameters are routinely, but not as often as required, identified by established influence measures and their corresponding cut-off values. In this work, we are specifically concerned with the presence of influential data points that directly impact hypothesis testing of linear regressions, which none of the established measures describe. Interestingly, the highly overlooked influence measure dfstat and its derived leave-one-out p -value exists exactly for this purpose, unmentioned in the majority of statistical text books as well as absent from all available statistical software packages. Its application for identifying these data points seems pivotal, as scientific reasoning in publications is almost exclusively based on the p -value of the fit, commonly adhering to the α = 0.05 threshold to state significance or not. With this metric, we found for 29 of 100 digitizable papers published in Science, Nature and PNAS in 2016, a time when the “reproducibility crisis” was a growing concern, that stated significances (or their absence) are based on the presence of a single influential data point.

Version published to 10.1101/2024.10.30.621016 on bioRxiv
Oct 30, 2024

Biostatistical Models for Predicting Mortality in Intensive Care Units

This article has 1 author:
1. Saleh Afaneh
This article has no evaluationsLatest version Jan 16, 2026
Altmetrics as Indicators of Research Dissemination and Impact in Thoracic Surgery

This article has 7 authors:
1. Rickesh Bharat Karsan
2. Alana Atkinson
3. James Cartlidge
4. Christopher Petticrew
5. Susan Durkhan
6. Rachel Robert
7. Gwyn William Beattie
This article has no evaluationsLatest version Jan 5, 2026
Ten Quick Tips for Biomedical Federated Learning

This article has 8 authors:
1. Kyle Ellrott
2. Venkat S. Maladi
3. Jean-Christophe Bélisle-Pipon
4. Emek Demir
5. Yael Bensoussan
6. Serghei Mangul
7. Alex A. T. Bui
8. Paul C. Boutros
This article has no evaluationsLatest version Jan 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Biostatistical Models for Predicting Mortality in Intensive Care Units

Altmetrics as Indicators of Research Dissemination and Impact in Thoracic Surgery

Ten Quick Tips for Biomedical Federated Learning