Automated Detection of Invalid Responses to Creativity Assessments

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Participants in creativity studies sometimes produce invalid data that is unusable for analysis, such as nonsensical or incomplete responses to idea generation tasks. Identifying such responses is a time-consuming yet necessary process to ensure robust results but also remains challenging to automate. We explore the efficacy of transformer language models (TLMs) for automatically detecting invalid creativity responses, using only the text prompt and response and no other metadata about the experimental session. We train a suite of transformers to detect invalid data for two creativity assessments: the Alternate Uses Task (AUT) and a design problems task (DPT). We find that transformers generally outperform other baselines for both tasks. Further, we show that TLMs’ predictions are well calibrated to the quality of the participant response, ensuring that model failures will occur in a predictable way and that high-quality responses are unlikely to be labeled invalid. Finally, we conduct a fairness analysis based on language background—using an adversarial study where participants attempt to “break” the model by coming up with invalid responses that are nonetheless labeled valid. Our results demonstrate the potential of deep learning methods for cleaning creativity assessment data, using solely participant responses, in a reliable and unbiased way.

Article activity feed