Tackling challenges in data pooling: missing data handling in latent variable models with continuous and categorical indicators

Lihan Chen
Milica Miocevic
Carl F. Falk

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Data pooling is a powerful strategy in empirical research, but combining multiple datasets often results in a large amount of missing data: Variables that are not available across all datasets will contain missing values for entire groups of participants as a result. Furthermore, data pooling typically leads to a mix of continuous and categorical items with nonnormal multivariate distributions. We investigated two popular approaches to handle missing data in this context: 1. applying direct maximum likelihood by treating data as continuous (con-ML), and 2. applying categorical least squares using a polychoric correlation matrix computed from pairwise deletion (cat-LS). These approaches are available for free and relatively straightforward for empirical researchers to implement. Through simulation studies with confirmatory factor analysis and latent mediation analysis, we found cat-LS to be unsuitable for pooled data analysis, whereas con-ML yielded acceptable performance for the estimation of latent path coefficients barring severe nonnormality.

Version published to 10.31219/osf.io/gsq6f_v1 on OSF Preprints
Feb 28, 2025
Version published to 10.1080/10705511.2023.2300079
Feb 16, 2024

Multiple imputation using multivariate adaptive regression splines

This article has 1 author:
1. Jerome Sepin
This article has no evaluationsLatest version May 13, 2025
Missing Data Handling via EM and Multiple Imputation in Network Analysis using glasso and atan Regularization

This article has 2 authors:
1. Kai Jannik Nehler
2. Martin Schultze
This article has no evaluationsLatest version Apr 30, 2025
Predicting Dropout in Intensive Longitudinal Data: Extending the Joint Model for Autocorrelated Data

This article has 3 authors:
1. Fridtjof Petersen
2. Laura Francina Bringmann
3. Dimitris Rizopoulos
This article has no evaluationsLatest version May 13, 2025

Listed in

Abstract

Article activity feed

Related articles

Multiple imputation using multivariate adaptive regression splines

Missing Data Handling via EM and Multiple Imputation in Network Analysis using glasso and atan Regularization

Predicting Dropout in Intensive Longitudinal Data: Extending the Joint Model for Autocorrelated Data