Domain-Agnostic Translation of Natural Language Text to Cypher Query Language for GraphRAG

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

GraphRAG is a retrieval-augmented generation (RAG) framework that leverages knowledge graphs. Among the various knowledge retrieval techniques used with GraphRAG, subgraph retrieval using Cypher queries is employed by SubGraph Retrieval Augmented Generation (SG-RAG). However, SG-RAG relies on manually crafted Cypher templates, which limits its practicality and scalability in real-world applications. To address this limitation, we propose a domain-agnostic Text-to-Cypher (Text2Cypher) translation model as a flexible subgraph retrieval mechanism for SG-RAG and other GraphRAG-based methods. Due to the absence of large-scale, multi-domain Text2Cypher datasets, we generate a synthetic multi-domain Text2Cypher dataset and fine-tune a large language model (LLM) on this data. Furthermore, we introduce a GPT-based evaluation metric that does not require access to a populated graph database. We evaluate the fine-tuned model on both the generated dataset and the MetaQA benchmark. Experimental results demonstrate that our model significantly outperforms open-source generative LLMs across multiple few-shot settings, as well as the Text2Cypher model proposed by Neo4j. Finally, we analyze the correlation between the proposed GPT-based evaluation metric and execution-based F1 scores on MetaQA using the Pearson correlation coefficient, revealing a strong positive correlation.

Article activity feed