An Empirical Study of the Comparison of Task Recommendation Techniques and Similar Source Code in Open Source Software Projects

Getúlio Coimbra Regis
Igor Wiese
Ivanilton Polato
Marco Aurélio Graciotto Silva
Reginaldo Ré
Walter Nakamura
Igor Steinmacher

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Context: Managing issues in open-source software projects is challenging and costly, as many developers are casual and/or newcomers. On the one hand, maintainers must ensure the quality of issue descriptions and their labels and create mechanisms for recommending and assigning issues. On the other hand, to complete the issue, contributors must understand it and locate the artifacts related to a given functionality or the defect to be fixed. Objectives: This work aimed to conduct a comparative study of different models for recommending similar issues that could help developers with their contributions. Methods: We collected data on issues and pull requests from 35 open-source projects hosted on GitHub. We used the Term Frequency Inverse Document Frequency (TF-IDF), Sentence BERT (SBERT), and Word2Vec techniques to recommend similar issues and source code to assist newcomers' contributions. Results: The models based on the SBERT and TF-IDF techniques yielded better results in the recommendations generated than Word2Vec in the two evaluated scenarios (general issues and those marked as good for newcomers). SBERT was able to recommend past issues where the code used in the solution was approximately 17% similar to the actual solution of the issue used as a query to evaluate the models, reaching results similar to those of GPT 3.5 and GPT 4. Conclusion: Based on the empirical results obtained, we hope to take the next steps in transferring the knowledge gained to software projects and developers, especially by supporting newcomers developers during their first contribution.

Version published to 10.21203/rs.3.rs-6322361/v1 on Research Square
Apr 14, 2025

Extend(ed)ing work to get Academics the credit they deserve using the Wikimedia Impact Tracker

This article has 1 author:
1. Brett Buttliere
This article has no evaluationsLatest version May 11, 2025
LEDGE : Leveraging Dependency Graphs for Enhanced Context Aware Documentation Generation

This article has 6 authors:
1. Mihir Panchal
2. Arnav Deo
3. Varad Prabhu
4. Prinkal Doshi
5. Chetashri Bhadane
6. Pranit Bari
This article has no evaluationsLatest version Jun 6, 2025
Challenges of Deploying Code Embeddings in Industry

This article has 4 authors:
1. Benedikt Fein
2. Maximilian Jungwirth
3. Gordon Fraser
4. Florian Kandlinger
This article has no evaluationsLatest version May 8, 2025

Listed in

Abstract

Article activity feed

Related articles

Extend(ed)ing work to get Academics the credit they deserve using the Wikimedia Impact Tracker

LEDGE : Leveraging Dependency Graphs for Enhanced Context Aware Documentation Generation

Challenges of Deploying Code Embeddings in Industry