Boosting Commit Classification with Contrastive Learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Commit Classification (CC) is an important task in software maintenance, which helps software developers classify commit changes into different types according to their nature and purpose. However, existing models need lots of manually labeled data for model fine-tuning, when training samples are insufficient, ensuring the performance of commit classification becomes very critical. The scarcity of data also leads to the problem of poor model generalization ability, resulting in satisfactory performance only on specific tasks. Moreover, they often ignore the sentence-level semantic information in the commit message, which is essential for discovering the difference between diverse commits, especially for fewshot scenarios. In this work, we propose to boost commit classification with contrastive learning. This method can solve the CC problem in fewshot scenarios. To augment the training datasets and improve the generalization ability of our proposed method, we generate additional training samples by Semantic Prototype, which is defined as a representative embedding for a group of semantically similar instances. To produce meaningful and discriminating sentence-level vectors for each commit in a pair, we employ a pretrained Sentence-Transformer as the embedding layer. The network then learns to maximize the distance in the latent space for positive pairs and minimize it for negative pairs, leading to a fine-tuned Sentence-Transformer with fixed weights for the downstream commit classification task. Extensive experiments on two open available datasets demonstrate that our framework, though simple, can solve the CC problem effectively even in fewshot scenarios. It not only achieves state-of-the-art performance but also improves the adaptability of the model without requiring a large number of training samples for fine-tuning. The code, data, and trained models are available at https://github.com/CUMT-GMSC/CommitFit.