Cross-validation under data dependence: a review-derived taxonomy and trustcv, a leakage-aware Python toolkit

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Inappropriate cross-validation can inflate performance estimates in medical machine learning , yet validation methods for grouped, temporal, and spatial data remain fragmented across literatures and incompletely supported in existing tools. We addressed this gap through a methodological study combining evidence synthesis, controlled synthetic illustration, and open-source implementation. Using purposive bidirectional snowballing from canonical seed papers, we identified and screened 29 cross-validation methods relevant to medical machine learning and organized them into a four-category taxonomy: independent and identically distributed (i.i.d., n = 9), grouped (n = 8), temporal (n = 8), and spatial (n = 4). Inclusion required documented medical or health-related application, or clear transferability to medical data structures. All 29 methods were implemented in trustcv (https://ki-smile.github.io/trustcv), a framework-agnostic Python toolkit that provides automated detection of six leakage types and structure-aware validation workflows. Controlled synthetic grouped benchmarks showed that, in the medium-leakage scenario, ignoring patient-level structure inflated area under the receiver operating characteristic curve (AUC) relative to Group K-Fold by 18.2 and 11.7 percentage points at the observation and patient levels, respectively. Implementation reliability was confirmed through automated verification of key correctness properties across 141 unit tests in 11 test modules. Together, the taxonomy and toolkit provide a practical foundation for more reliable , structure-aware model evaluation in medical machine learning.

Article activity feed