Detecting Duplicates in Bug Tracking Systems with Artificial Intelligence: A Combined Retrieval and Classification Approach

Iryna Pikh
Vsevolod Senkivskyy
Alona Kudriashova
Oleksii Bilyk
Liubomyr Sikora
Nataliia Lysa

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Duplicate bug reports increase the workload of software engineering teams and delay the resolution of critical issues, making automated detection essential. This paper presents a two-stage approach that combines transformer-based semantic retrieval with classical machine-learning classification. First, text features of the defect are vectorised using transformer models such as BERT (Bidirectional Encoder Representations from Transformers, google-bert/bert-base-uncased), MiniLM (Miniature Language Model, sentence-transformers/all-MiniLM-L6-v2) or MPNet (Masked and Permuted Pre-training for Language Understanding, sentence-transformers/all-mpnet-base-v2) to identify semantically similar reports and narrow the candidate search space. Second, the filtered pairs are classified using algorithms such as XGBoost (eXtreme Gradient Boosting), SVM (Support Vector Machines) or logistic regression to determine true duplicates. This hybrid method improves accuracy while substantially lowering computational cost. Experimental results validate the proposed approach, demonstrating robust accuracy and consistent performance in identifying duplicate defect reports.

Version published to 10.3390/app16010416
Dec 30, 2025
Version published to 10.20944/preprints202511.1068.v1
Nov 14, 2025

SMELL AWARE BUG CLASSIFICATION

This article has 1 author:
1. Khyber Zaland
This article has no evaluationsLatest version Jan 19, 2026
AIShield – A Framework for QA in Software Development to Detect AI Generated Code

This article has 3 authors:
1. Ram Mohan Reddy Ch
2. Raghavendra Rao R V
3. Charan Tej Reddy Boddapati
This article has no evaluationsLatest version Feb 2, 2026
AI-Driven Code Documentation: Comparative Evaluation of LLMs for Commit Message Generation

This article has 4 authors:
1. Mohamed Mehdi Trigui
2. Wasfi G. Al-Khatib
3. Mohammad Amro
4. Fatma Mallouli
This article has no evaluationsLatest version Dec 24, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

SMELL AWARE BUG CLASSIFICATION

AIShield – A Framework for QA in Software Development to Detect AI Generated Code

AI-Driven Code Documentation: Comparative Evaluation of LLMs for Commit Message Generation