Mining Transparency: Assessing Open Science Practices in Crime Research Over Time Using Machine Learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This pilot study addresses the current lack of systematic, large-scale evidence on Open Science Practices (OSPs) adoption in criminology and legal psychology. A scalable, machine-learning-based text classification pipeline is introduced to map the prevalence of Open Access (OA), Open Data (OD), Open Materials (OM), and Preregistration (PR). The analysis is based on publication metadata and a year-stratified sample of full texts from the top 100 journals in Criminology & Penology, Law, and Psychology (2013-2023). After identifying articles containing statistical inference (SI) via a high-performing classifier, the author utilized GPT-assisted coding and supervised learning to train specific classifiers for OD, OM, and PR. OA was classified using publicly available metadata. Among 1,763 SI articles with usable full text, design-based estimates reveal a significant disparity in OSP adoption. OA is relatively common (40.9%, 95% CI: 38.8-43.1) and has steadily increased from approximately 20% in 2013 to 50% in 2023. By sharp contrast, trends for OD, OM, and PR cannot be reliably quantified. Extreme class imbalance and the minimal number of positive cases indicate a very low underlying true prevalence for these practices in the assessed field. Methodologically, the study confirms that GPT-assisted coding supports accurate SI detection, but robust prevalence estimation for extremely low-frequency OSPs remains challenging for downstream classifiers. Overall, this project establishes a transparent and reproducible pipeline and provides critical baseline estimates for future, larger-scale assessments of research transparency in crime-related fields.

Article activity feed