Guided Query Navigation over Heterogeneous Multi-Source Databases: Pre-Validated SQL Routing as an Alternative to Free-Form NL-to-SQL
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Natural language interfaces to relational databases have long promised to democratize data access. In practice, free-form NL-to-SQL systems fail systematically: users formulate queries referencing entities that do not exist, use vocabulary that does not match column values, or request aggregations over ranges absent from the data. The root cause is an information asymmetry: users cannot know what is queryable without inspecting the database, yet inspection requires the technical knowledge they lack. We present Guided Query Navigation (GQN), an architecture that resolves this asymmetry by inverting the NL-to-SQL direction. Rather than translating arbitrary user questions to SQL at inference time, GQN generates a curated catalog of natural language questions guaranteed to be answerable given the actual database content, each pre-paired with a validated SQL translation stored in a Redis key-value cache. At inference time, query resolution is a key lookup at O(1), bypassing LLM inference entirely for catalog hits and eliminating the primary failure point of free-form translation. GQN extends this principle to heterogeneous multi-source environments, simultaneously connecting to Oracle, SQL Server, PostgreSQL, MySQL, and Weaviate vector collections. Production evaluation demonstrates a +21.3 percentage point correctness improvement over a free-form GPT-4o NL-to-SQL baseline, with sub-millisecond SQL resolution for catalog hits.