Challenges of Deploying Code Embeddings in Industry

Benedikt Fein
Maximilian Jungwirth
Gordon Fraser
Florian Kandlinger

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The recent hype around machine learning has fully captured software engineering research. Correspondingly, a variety of different ways to represent code as input to deep learning models have been proposed. These code embedding models are usually evaluated in terms of common metrics such as accuracy or bleu scores, on benchmark tasks such as predicting method names from their body. Although this evaluation approach is well established in research, it leaves open challenges for the deployment of these models in practice: First, comparing accuracy on standardised benchmark results conveniently avoids some of the challenges of actually running different prototype model implementations, which, however is necessary to apply the models in practice. Second, the models are usually trained and evaluated on abundantly available open-source training data, which may be very different from closed-source industrial code. Third, the deployment of machine learning models in an industrial environment does not only entail technical but also organisational challenges. Finally, while competitive accuracy or bleu scores may be indicative of relative model performance, they may not reflect to what extent the models are suitable for being used by developers. In this paper we describe our experience of deploying and evaluating state-of-research code embedding models in an industrial environment, and present lessons learned from our struggles with each of these questions.

Version published to 10.21203/rs.3.rs-6469205/v1 on Research Square
May 8, 2025

An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We?

This article has 5 authors:
1. Hyunjae Suh
2. Mahan Tafreshipour
3. Jiawei Li
4. Adithya Bhattiprolu
5. Iftekhar Ahmed
This article has no evaluationsLatest version Apr 30, 2025
AI-Powered Defect Prediction: From Code Smells to Failure Forecasting

This article has 5 authors:
1. Md Mostafizur Rahman
2. Md Mostafijur Rahman
3. Maria Khatun Shuvra
4. Md Mashfiquer Rahman
5. Najmul Gony
This article has no evaluationsLatest version Jun 9, 2025
Software Engineering Challenges in the Deployment of Generative AI Models at Scale

This article has 1 author:
1. Devisharan Mishra
This article has no evaluationsLatest version Jun 17, 2025

Listed in

Abstract

Article activity feed

Related articles

An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We?

AI-Powered Defect Prediction: From Code Smells to Failure Forecasting

Software Engineering Challenges in the Deployment of Generative AI Models at Scale