A Simulation Study to Advance Human-Centred Artificial Intelligence via Digital Citizen Science: Can Large Language Models Transform Current Approaches to Missing Data Imputation?

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Missing data is a persistent challenge in digital health research, and traditional approaches like Multiple Imputation by Chained Equations (MICE) may not capture complex patterns. While large language models (LLMs) could offer a viable alternative, their use in this context remains understudied. Moreover, a critical gap remains in embedding human-centred artificial intelligence (AI) approaches that integrate equity, transparency, and stakeholder participation. Digital citizen science, which leverages citizen-owned devices for ethical, participatory big data collection, offers a foundation to advance such approaches in digital health.

Objective

To evaluate and compare the imputation accuracy of MICE with the OpenAI o3 model for categorical variables in a simulated digital health dataset under different missingness mechanisms and levels, while situating this evaluation within the broader vision of human-centred AI enabled by digital citizen science.

Methods

A complete digital health dataset collected through a digital citizen science platform was used to simulate missingness under Missing at Random (MAR) and Missing Completely at Random (MCAR) at 10%, 25%, and 50%. MICE used logistic regression with five imputations and ten iterations per chain. For the o3 model, structured prompts were generated for each missing entry using all available non-missing variables from the same record. Both methods were evaluated on each simulated dataset using classification accuracy and a closeness metric representing similarity to the original data. Statistical differences were tested with a two-sample Z-test, and misclassification patterns were examined by variable type and category frequency.

Results

Under MAR conditions, MICE and o3 performed similarly with an average accuracy of 0.60 and 0.59, and closeness metrics of 0.83 and 0.85, respectively. Under MCAR, both methods achieved 0.59 accuracy, with closeness metrics of 0.84 and 0.85. No statistically significant differences were found across conditions (all p > 0.05).

Conclusion

While MICE remains preferred for continuous data, the o3 model shows promise as a complementary tool for categorical imputation in smaller datasets. Beyond methodological comparability, this study demonstrates how digital citizen science can serve as an ethical foundation for embedding human-centred AI into digital health research, positioning large language models not only as technical tools but also as vehicles for advancing equity, transparency, and participatory innovation in healthcare.

Article activity feed