The Identification of Guessing Patterns in Progress Testing as a Machine Learning Classification Problem

Iván Roselló Atanet
Victoria Sehy
Miriam Sieg
Maren März

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The detection of guessing patterns in low-stakes progress testing could naturally be understood as a statistical classification problem where test takers are assigned to groups according to probabilities given by a machine learning model. However, the relevant literature on this topic does not include many examples where this approach is discussed; to date, the strategies applied to tackle this problem have been mostly based either on rapid response counting or the detection of unusual answer patterns. On the basis of 14,897 participations in the Progress Test Medizin test – which takes place twice a year since 1999 in selected medical schools of Germany, Austria and Switzerland - we formulated the identification of guessing patterns as a binary classification problem. Next, we compared the performance of a logistic regression algorithm in this setup to that of the nonparametric person-fit indices included in R´s PerFit package. Finally, we determined probability thresholds based on the values of the logistic regression functions obtained from the algorithm. The logistic regression algorithm included in Python´s Scikit-Learn reached ROC-AUC scores of 0.886 to 0.903 depending on the dataset, while the 11 person-fit indices analysed returned ROC-AUC scores of 0.548 to 0.761. Datasets based on aggregate scores yielded better results than those were the sets of answers to every item were considered as individual features. The best results were reached with a feature set containing only two parameters (self-monitoring accuracy and number of answered questions); considering the amount of time spent on the test did not lead to any performance improvement. Based on the values of the logistic regression function generated by the applied algorithm, it is possible to establish thresholds above which there is at least a 90% chance of having guessed most answers. In this setting, logistic regression clearly outperformed non-parametric person-fit indices in the task of identifying guessing patterns. We attribute this result to the greater flexibility of machine learning methods, which makes them more adaptable to diverse test environments than person-fit indices.

Version published to 10.21203/rs.3.rs-4731140/v3 on Research Square
Oct 30, 2024
Version published to 10.21203/rs.3.rs-4731140/v2 on Research Square
Aug 2, 2024
Version published to 10.21203/rs.3.rs-4731140/v1 on Research Square
Jul 16, 2024

Number Line Estimation: another view in the light of the ACE, Arithmécole, and ELFE data.

This article has 1 author:
1. Jean-Paul Fischer
This article has no evaluationsLatest version Oct 25, 2024
Analysis of Different Machine Learning Models for Credit Card Fraud Detection

This article has 1 author:
1. Harsh Mehta
This article has no evaluationsLatest version Oct 24, 2024
Predicting factors associated with under-5 mortality in India using machine learning algorithms: evidence from National Family Health Survey, 2019-21

This article has 3 authors:
1. Abhay Mishra
2. Guru Vasishtha
3. Suraj Maiti
This article has no evaluationsLatest version Oct 23, 2024

Listed in

Abstract

Article activity feed

Related articles

Number Line Estimation: another view in the light of the ACE, Arithmécole, and ELFE data.

Analysis of Different Machine Learning Models for Credit Card Fraud Detection

Predicting factors associated with under-5 mortality in India using machine learning algorithms: evidence from National Family Health Survey, 2019-21