Automated Scoring of Creative Achievement
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The assessment of creative achievement (CA) can be cumbersome as participants are typically asked to respond to long lists of possible accomplishments that may still miss their specific achievements. A bottom-up alternative is to let participants openly report their most significant CAs, which, however, involves more complex scoring such as via human ratings. In this study, we investigated whether language models (LMs) can provide an efficient and valid scoring of such open-ended responses. Across two data sets, participants described their three most significant CAs. These responses were rated by human judges and by three LMs (Llama 3.1–8B, Llama 3.3–70B, GPT-4o) using zero-shot prompting. Correlations between human and LM ratings were consistently high (r = .53–.80), and criterion validity evidence of LM-based scores was largely on par with rater-based scores. In addition, we examined zero-shot domain classification of CAs into nine creative domains (e.g., music, visual arts). Classification accuracy was 62.3% overall; closer inspection of findings suggested that automated classification has the potential to unveil conceptual overlaps between domains and to identify CAs involving multiple domains. Taken together, the findings suggest that automated scoring of CA via LMs represents a promising and efficient alternative to traditional CA measures.