Boosting Code Retrieval Performance on Q\&A Sites with Neural Mutual Attentions

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Code retrieval aims to identify the most relevant code fragments in response to natural language queries and plays an important role in supporting software development on software Q&A platforms. However, effectively modeling the semantic relationship between queries and code remains challenging, particularly due to noisy and incomplete code snippets. To address this challenge, we propose NMAC, a Neural Mutual Attention approach for code retrieval from Q&A sites. The proposed model combines distributed word representations with a Bi-directional Long Short-Term Memory (Bi-LSTM) network and a mutual attention mechanism to explicitly capture fine-grained interactions between query titles and code fragments. NMAC analyzes key code elements, including variables, functions, documentation strings, and comments, to assess semantic relevance more accurately. The effectiveness of the proposed framework is evaluated on a benchmark dataset derived from a software Q&A platform. Experimental results demonstrate that NMAC outperforms existing baseline approaches in terms of retrieval accuracy.

Article activity feed