An Empirical Study of the Comparison of Task Recommendation Techniques and Similar Source Code in Open Source Software Projects
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Context: Managing issues in open-source software projects is challenging and costly, as many developers are casual and/or newcomers. On the one hand, maintainers must ensure the quality of issue descriptions and their labels and create mechanisms for recommending and assigning issues. On the other hand, to complete the issue, contributors must understand it and locate the artifacts related to a given functionality or the defect to be fixed. Objectives: This work aimed to conduct a comparative study of different models for recommending similar issues that could help developers with their contributions. Methods: We collected data on issues and pull requests from 35 open-source projects hosted on GitHub. We used the Term Frequency Inverse Document Frequency (TF-IDF), Sentence BERT (SBERT), and Word2Vec techniques to recommend similar issues and source code to assist newcomers' contributions. Results: The models based on the SBERT and TF-IDF techniques yielded better results in the recommendations generated than Word2Vec in the two evaluated scenarios (general issues and those marked as good for newcomers). SBERT was able to recommend past issues where the code used in the solution was approximately 17% similar to the actual solution of the issue used as a query to evaluate the models, reaching results similar to those of GPT 3.5 and GPT 4. Conclusion: Based on the empirical results obtained, we hope to take the next steps in transferring the knowledge gained to software projects and developers, especially by supporting newcomers developers during their first contribution.