Domain-Agnostic Translation of Natural Language Text to Cypher Query Language for GraphRAG

Ahmmad O. M. Saleh
Gokhan Tur
Yucel Saygin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

GraphRAG is a retrieval-augmented generation (RAG) framework that leverages knowledge graphs. Among the various knowledge retrieval techniques used with GraphRAG, subgraph retrieval using Cypher queries is employed by SubGraph Retrieval Augmented Generation (SG-RAG). However, SG-RAG relies on manually crafted Cypher templates, which limits its practicality and scalability in real-world applications. To address this limitation, we propose a domain-agnostic Text-to-Cypher (Text2Cypher) translation model as a flexible subgraph retrieval mechanism for SG-RAG and other GraphRAG-based methods. Due to the absence of large-scale, multi-domain Text2Cypher datasets, we generate a synthetic multi-domain Text2Cypher dataset and fine-tune a large language model (LLM) on this data. Furthermore, we introduce a GPT-based evaluation metric that does not require access to a populated graph database. We evaluate the fine-tuned model on both the generated dataset and the MetaQA benchmark. Experimental results demonstrate that our model significantly outperforms open-source generative LLMs across multiple few-shot settings, as well as the Text2Cypher model proposed by Neo4j. Finally, we analyze the correlation between the proposed GPT-based evaluation metric and execution-based F1 scores on MetaQA using the Pearson correlation coefficient, revealing a strong positive correlation.

Version published to 10.21203/rs.3.rs-8594899/v1 on Research Square
Feb 9, 2026

DiLLaB: Discussion Labeling with LLMs for Building Datasets

This article has 6 authors:
1. Ludimila Gonçalves
2. Márcia Lima
3. André Carvalho
4. Walter Nakamura
5. Igor Steinmacher
6. Tayana Conte
This article has no evaluationsLatest version Jan 28, 2026
Multilingual Rag Agents For Localized Knowledge: Adaptive Indexing For Under-Represented Languages

This article has 2 authors:
1. Nnaemeka Kingsley Ugwumba
2. Kelechi Ernest Okechukwu
This article has no evaluationsLatest version Jan 29, 2026
Knowledge and Context Compression via Question Generation

This article has 6 authors:
1. Alex Anvi Eponon
2. Moein Shahiki-Tash
3. Abdullah -
4. Luis Ramos
5. Christian Maldonado-Sifuentes
6. Ildar Batyrshin
This article has no evaluationsLatest version Jan 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DiLLaB: Discussion Labeling with LLMs for Building Datasets

Multilingual Rag Agents For Localized Knowledge: Adaptive Indexing For Under-Represented Languages

Knowledge and Context Compression via Question Generation