Using Classification Trees to Identify the Best Method in Monte Carlo Simulations: From Population Parameters to Observed Features

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Monte Carlo simulations are widely used to compare statistical methods, but their findings are often difficult to interpret and rarely translate into practical guidelines for method selection. This gap arises for two reasons. First, method performance often depends on complex interactions among simulation factors that are hard to detect with conventional summaries. Second, performance is typically evaluated against population values that are unknown in practice, whereas applied researchers only see features of their sample data. We propose a classification tree framework that addresses both problems, along with two pruning strategies tailored to the simulation context: a combined pruning procedure that accounts for equivalent representations of the same data, and effect-size-based pruning that prevents large numbers of replications from inflating tree complexity. We illustrate this framework by selecting among zero cell correction strategies for estimating tetrachoric correlations. In Example 1, we reanalyze results from Choi & Wu (2026) using the original simulation factors as predictors, yielding clear decision rules that still rely on population values. In Example 2, we generate data under continuous simulation factors and construct predictors directly from observed 2×2 contingency tables, allowing the resulting rules to be applied to real data. More broadly, the proposed framework provides a general approach for translating Monte Carlo comparisons of competing methods into practical method selection guidelines.

Article activity feed