Challenges of Deploying Code Embeddings in Industry
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The recent hype around machine learning has fully captured software engineering research. Correspondingly, a variety of different ways to represent code as input to deep learning models have been proposed. These code embedding models are usually evaluated in terms of common metrics such as accuracy or bleu scores, on benchmark tasks such as predicting method names from their body. Although this evaluation approach is well established in research, it leaves open challenges for the deployment of these models in practice: First, comparing accuracy on standardised benchmark results conveniently avoids some of the challenges of actually running different prototype model implementations, which, however is necessary to apply the models in practice. Second, the models are usually trained and evaluated on abundantly available open-source training data, which may be very different from closed-source industrial code. Third, the deployment of machine learning models in an industrial environment does not only entail technical but also organisational challenges. Finally, while competitive accuracy or bleu scores may be indicative of relative model performance, they may not reflect to what extent the models are suitable for being used by developers. In this paper we describe our experience of deploying and evaluating state-of-research code embedding models in an industrial environment, and present lessons learned from our struggles with each of these questions.