EFA-Boosting: An Iterative Optimization Algorithm for Exploratory Factor Analysis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Item removal decisions in exploratory factor analysis (EFA) are often only loosely documented in methods sections, which limits transparency and makes results difficult to reproduce. This paper introduces EFA-Boosting, an iterative optimization algorithm designed to systematically identify and eliminate problematic items (e.g., cross-loadings, weak primary loadings, and Heywood cases) using configurable rules that produce a complete audit trail of every elimination decision. The algorithm follows a hierarchical logic in which structural violations are addressed before fit refinement, and it evaluates candidate solutions using a composite loss function that integrates RMSEA, SRMR, and CFI through adaptive weights. Because RMSEA can be spuriously inflated when degrees of freedom are very small, the RMSEA weight is automatically recalibrated when degrees of freedom fall below five; preliminary calibration studies indicated this threshold as appropriate for preventing spurious RMSEA inflation from driving elimination decisions. Monte Carlo simulations encompassing 27,000 replications across 54 experimental conditions demonstrated that specificity ranged from 95% to 100% in multifactorial models and from 85% to 92% in unidimensional models, while sensitivity averaged between 73% and 80% overall and exceeded 90% when models contained three factors with sample sizes of at least 500 observations. Comparative analyses against a threshold-based baseline procedure revealed that conventional loading-magnitude criteria achieved zero percent sensitivity because items with cross-loadings typically maintain high primary loadings that exceed standardcutoffs, whereas EFA-Boosting achieved 77.5% sensitivity with a Youden index of 0.74, representing a substantial improvement in detecting structurally problematic items. Although the algorithm does not eliminate researcher judgment regarding threshold selection, factor enumeration, or the balance between statistical and substantive criteria, it provides documented traceability that enables other researchers to examine and replicate the sequence of decisions that shaped the final factor solution.