QModel: A Time-Aware GitHub Mining Framework for Empirical Software Quality Studies

Dmytro Polishchuk

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Empirical studies in software engineering frequently rely on ad hoc scripts to mineGitHub data, which makes metrics hard to compare and results difficult to reproduce. Thispaper presents QModel, an open-source framework that automatically collects and linksrepository information about commits (as a DAG-directed acyclic graph), pull requests,issues, timelines, file changes, and user reactions into a consistent relational schema designedfor quality analysis. Its companion module, QModel Compilation, turns SQL queries over thisschema into executable analyses by generating feature-target datasets and running statisticalor machine-learning strategies (correlation, regression, PCA, random forest, and others).Together, the tools provide an end-to-end, containerized pipeline that allows researchers andpractitioners to define quality hypotheses in SQL, recreate analyses across projects, andexplore how process and structural characteristics (e.g., branching depth, merge activity,developer responsiveness) relate to outcomes such as review time and defect density. Weillustrate the framework on long-lived GitHub projects, combining time-aware graph metricswith SZZ-style defect linking, and show how metrics of bug-introducing commits can serveas lightweight proxies for process bottlenecks and delayed defect handling in distributeddevelopment. All source code, container images, and replication notebooks are publiclyavailable, supporting the community goal of transparent, reusable, and extensible researchon software quality.

Version published to 10.21203/rs.3.rs-8478733/v1 on Research Square
Jan 12, 2026

A Discovery Technique for Expressive Yet Sound Process Models

This article has 3 authors:
1. Humam Kourani
2. Gyunam Park
3. Wil M.P. van der Aalst
This article has no evaluationsLatest version Jan 12, 2026
Insightimate: Enhancing Software Effort Estimation Accuracy Using Machine Learning Across Three Schemas (LOC/FP/UCP)

This article has 6 authors:
1. Nguyen Nhat Huy
2. Duc Man Nguyen
3. Dang Nhat Minh
4. Nguyen Thuy Giang
5. P. W. C. Prasad
6. Md Shohel Sayeed
This article has no evaluationsLatest version Feb 2, 2026
LLM model for ESG Reporting

This article has 1 author:
1. Al Khan
This article has no evaluationsLatest version Jan 21, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Discovery Technique for Expressive Yet Sound Process Models

Insightimate: Enhancing Software Effort Estimation Accuracy Using Machine Learning Across Three Schemas (LOC/FP/UCP)

LLM model for ESG Reporting