Automated Detection of Invalid Responses to Creativity Assessments

Antonio Laverghetta
Simone Lucchini
Jimmy Pronchick
Roger Beaty

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Participants in creativity studies sometimes produce invalid data that is unusable for analysis, such as nonsensical or incomplete responses to idea generation tasks. Identifying such responses is a time-consuming yet necessary process to ensure robust results but also remains challenging to automate. We explore the efficacy of transformer language models (TLMs) for automatically detecting invalid creativity responses, using only the text prompt and response and no other metadata about the experimental session. We train a suite of transformers to detect invalid data for two creativity assessments: the Alternate Uses Task (AUT) and a design problems task (DPT). We find that transformers generally outperform other baselines for both tasks. Further, we show that TLMs’ predictions are well calibrated to the quality of the participant response, ensuring that model failures will occur in a predictable way and that high-quality responses are unlikely to be labeled invalid. Finally, we conduct a fairness analysis based on language background—using an adversarial study where participants attempt to “break” the model by coming up with invalid responses that are nonetheless labeled valid. Our results demonstrate the potential of deep learning methods for cleaning creativity assessment data, using solely participant responses, in a reliable and unbiased way.

Version published to 10.31234/osf.io/my36p_v1 on OSF Preprints
Sep 9, 2025

Chatbots Are Undermining Crowdsourced Research in the Behavioral Sciences: Detecting AI-Assisted Cheating with a Keystroke-Based Tool

This article has 4 authors:
1. Michael W. Asher
2. Gillian Gold
3. Eason Chen
4. Paulo F. Carvalho
This article has no evaluationsLatest version Aug 12, 2025
Evaluating the Accuracy and Reliability of AI Content Detectors in Academic Contexts

This article has 3 authors:
1. Mohammad Hadra
2. Karleen Cambridge
3. Mostefa Mesbah
This article has no evaluationsLatest version Sep 16, 2025
Optimizing GPT-Based Distractor Generation for the Korean CSAT English Exam

This article has 2 authors:
1. Chan Young Jung
2. Sanghoun Song
This article has no evaluationsLatest version Sep 18, 2025

Listed in

Abstract

Article activity feed

Related articles

Chatbots Are Undermining Crowdsourced Research in the Behavioral Sciences: Detecting AI-Assisted Cheating with a Keystroke-Based Tool

Evaluating the Accuracy and Reliability of AI Content Detectors in Academic Contexts

Optimizing GPT-Based Distractor Generation for the Korean CSAT English Exam