How Much Data Is Enough? A Design-aware Approach to Empirical Sample Complexity in Political Science

Perry Jess Carter
Dahyun Choi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

How much data is needed to ensure that a model performs reliably on new, unseen data? Despite its central importance to empirical research design, sample size decisions are often made heuristically—guided more by resource constraints than by principled diagnostics. Existing tools like power analysis and cross-validation offer limited insight into how predictive performance scales with sample size. We introduce a design-aware, empirical framework for estimating sample complexity bounds tailored to applied settings. By fitting smooth extrapolation functions to model performance from resampled pilot data, our method estimates the sample size needed to achieve researcher-specified generalization guarantees. Through applications to supervised learning tasks involving extensive human-annotated data, we show that generalization often stabilizes with as little as 10% of typical labeling costs. This approach provides a statistically grounded, interpretable diagnostic for generalization performance and a practical tool for political scientists designing data-intensive studies under resource constraints or design uncertainty.

Version published to 10.31219/osf.io/evrcj_v2 on OSF Preprints
May 16, 2025
Version published to 10.31219/osf.io/evrcj_v1 on OSF Preprints
Jul 12, 2024

Synthesizing Single-Case Experimental Designs: Modeling Complex Data Structures

This article has 6 authors:
1. Ke Cheng
2. Zhiyao Yi
3. Mariola Moeyaert
4. S. Natasha Beretvas
5. Wim Van den Noortgate
6. John Ferron
This article has no evaluationsLatest version Jun 6, 2025
Experiment-based calibration: inference and decision-making

This article has 2 authors:
1. Federico Mancinelli
2. Dominik R Bach
This article has no evaluationsLatest version Jun 11, 2025
Model comparison for factor models with Bayes factors through bridge sampling

This article has 3 authors:
1. Martin Schnuerch
2. Mahbod Mehrvarz
3. Jeffrey N. Rouder
This article has no evaluationsLatest version May 16, 2025

Listed in

Abstract

Article activity feed

Related articles

Synthesizing Single-Case Experimental Designs: Modeling Complex Data Structures

Experiment-based calibration: inference and decision-making

Model comparison for factor models with Bayes factors through bridge sampling