Evaluating ChatGPT-4o’s Performance in Construction of Q-Matrix for a Cognitive Diagnostic Assessment

Semih Aşiret
Seçil Ömür Sünbül

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study evaluates the performance of ChatGPT-4o in constructing Q-matrices for cognitive diagnostic assessments by comparing its outputs with those constructed by researchers and human experts. The research examines the overlap rates among these Q-matrices and assesses their validity using empirical methods. Two distinct mathematics datasets were used, and the Q-matrices were validated through statistical techniques to determine their model-data fit. The results indicate that ChatGPT-4o can generate Q-matrices with a high degree of overlap rate to those specified by human experts, demonstrating its potential as a tool for cognitive diagnostic assessments. The study highlights that AI-generated Q-matrices can be a valuable supplement to traditional methods, but expert validation remains essential to ensure theoretical accuracy and practical applicability. The findings suggest that a hybrid approach—integrating AI-based Q-matrix construction with expert refinement—can enhance the accuracy and efficiency of cognitive diagnostic assessments.

Version published to 10.21203/rs.3.rs-6235063/v1 on Research Square
Apr 24, 2025

Navigating the Maze of Measurement: Large Language Models for objective instrument selection

This article has 2 authors:
1. Viktória Gajdošová
2. Matus Adamkovic
This article has no evaluationsLatest version Dec 19, 2025
Navigating the Maze of Measurement: Large Language Models for objective instrument selection

This article has 2 authors:
1. Viktória Gajdošová
2. Matus Adamkovic
This article has no evaluationsLatest version Dec 19, 2025
Evaluation of ChatGPT Study Mode: Results from an expert survey regarding self-regulated learning

This article has 4 authors:
1. Evelyn Steinberg
2. Franziska Perels
3. Marc Aubreville
4. Laura Dörrenbächer-Ulrich
This article has no evaluationsLatest version Dec 18, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Navigating the Maze of Measurement: Large Language Models for objective instrument selection

Navigating the Maze of Measurement: Large Language Models for objective instrument selection

Evaluation of ChatGPT Study Mode: Results from an expert survey regarding self-regulated learning