Temporal validity of software datasets for code metrics: an empirical assessment of sampling strategies

Juan Andres Carruthers
Andrea Lezcano Airaldi
Jorge Andres Diaz Pace
Emanuel Irrazábal

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Context: In empirical research, drawing reliable conclusions about a target population requires working with representative samples. Representativeness refers to the degree to which a sample's properties of interest resemble those of the target population. However, a sample that was representative in the past might not be representative in the present day if the population has significantly evolved during that period. Objective: To evaluate the effectiveness of a dataset extraction tool for collecting current samples of software repositories and keeping their temporal validity over time. Method: We performed a Mining Software Repositories study utilizing three datasets: Tempero et al.’s Qualitas Corpus, a sample from Github and an updated version of the Qualitas Corpus. Based on these datasets, we generated thresholds for three source code metrics (Lines of Code, Cyclomatic Complexity and Weighted Methods per Class) and compared whether these thresholds yielded consistent results. Results: We observed significant differences in all the source code metrics under study when pairing the Qualitas Corpus and samples containing projects with recent development data, with the former registering higher thresholds. Furthermore, the thresholds obtained from the samples collected with our extraction tool recorded consistent thresholds. Conclusions: Using outdated code-based datasets in empirical studies can affect study results, therefore, it is important that researchers not only publish their datasets but also provide strategies to update those datasets over time. Additionally, we presented and validated sampling approaches implemented demonstrating their effectiveness to collect current samples.

Version published to 10.21203/rs.3.rs-7217702/v1 on Research Square
Aug 25, 2025

The inaccuracy of uniform counting in software metrics: empirical evidence with a weighted remedy

This article has 1 author:
1. Gholamali Nejad Hajali Irani
This article has no evaluationsLatest version Aug 22, 2025
Automating Software Size Measurement from Code Using Language Models

This article has 6 authors:
1. Samet Tenekeci
2. Hüseyin Ünlü
3. Bedir Arda Gül
4. Damla Keleş
5. Murat Küçük
6. Onur Demirörs
This article has no evaluationsLatest version Jul 25, 2025
Emergent Numeric Bias in Large Language Models: An Empirical Study on the Anomalous Recurrence of the Number 27 Across Independent Sessions

This article has 1 author:
1. Som Subhro Nath
This article has no evaluationsLatest version Jul 16, 2025

Listed in

Abstract

Article activity feed

Related articles

The inaccuracy of uniform counting in software metrics: empirical evidence with a weighted remedy

Automating Software Size Measurement from Code Using Language Models

Emergent Numeric Bias in Large Language Models: An Empirical Study on the Anomalous Recurrence of the Number 27 Across Independent Sessions