Tackling challenges in data pooling: missing data handling in latent variable models with continuous and categorical indicators
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Data pooling is a powerful strategy in empirical research, but combining multiple datasets often results in a large amount of missing data: Variables that are not available across all datasets will contain missing values for entire groups of participants as a result. Furthermore, data pooling typically leads to a mix of continuous and categorical items with nonnormal multivariate distributions. We investigated two popular approaches to handle missing data in this context: 1. applying direct maximum likelihood by treating data as continuous (con-ML), and 2. applying categorical least squares using a polychoric correlation matrix computed from pairwise deletion (cat-LS). These approaches are available for free and relatively straightforward for empirical researchers to implement. Through simulation studies with confirmatory factor analysis and latent mediation analysis, we found cat-LS to be unsuitable for pooled data analysis, whereas con-ML yielded acceptable performance for the estimation of latent path coefficients barring severe nonnormality.