An Empirical Study of the Comparison of Task Recommendation Techniques and Similar Source Code in Open Source Software Projects

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Context: Managing issues in open-source software projects is challenging and costly, as many developers are casual and/or newcomers. On the one hand, maintainers must ensure the quality of issue descriptions and their labels and create mechanisms for recommending and assigning issues. On the other hand, to complete the issue, contributors must understand it and locate the artifacts related to a given functionality or the defect to be fixed. Objectives: This work aimed to conduct a comparative study of different models for recommending similar issues that could help developers with their contributions. Methods: We collected data on issues and pull requests from 35 open-source projects hosted on GitHub. We used the Term Frequency Inverse Document Frequency (TF-IDF), Sentence BERT (SBERT), and Word2Vec techniques to recommend similar issues and source code to assist newcomers' contributions. Results: The models based on the SBERT and TF-IDF techniques yielded better results in the recommendations generated than Word2Vec in the two evaluated scenarios (general issues and those marked as good for newcomers). SBERT was able to recommend past issues where the code used in the solution was approximately 17% similar to the actual solution of the issue used as a query to evaluate the models, reaching results similar to those of GPT 3.5 and GPT 4. Conclusion: Based on the empirical results obtained, we hope to take the next steps in transferring the knowledge gained to software projects and developers, especially by supporting newcomers developers during their first contribution.

Article activity feed