Probing missing data in population-based longitudinal studies: A tutorial and application using R
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
A common challenge in longitudinal population-based research is the amount of incomplete and missing data that occurs for failing to complete the protocol, as well as potential loss to follow-up overtime. These types of missingness in a dataset can lead to problems such as biases in parameter estimates and loss of power during statistical testing, and ultimately, interpretation of findings. Yet, few studies report key information about missingness in their sample. Moreover, while a breadth of information already exists on the types of missingness and potential methods for handling missingness, the field lacks details on how to conduct a missingness analysis in a real-world setting. In this tutorial, we illustrate key steps in handling missing data and provide an opportunity for researchers to practice appropriate steps in identifying the magnitude and patterns of missing data, as well as assess both selection and retention bias. We utilize a large publicly available longitudinal pediatric population-based study with a focus on air pollution as the primary exposure and a type of emotional health behavior, known as internalization, as the outcome. We provide the reproducible R code for researchers to be able to easily adapt to their own longitudinal observational study.