FGCSQL: A Three-Stage Pipeline for Large Language Model-driven Chinese Text-to-SQL

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Recent advances in large language models have driven major breakthroughs in Text-to-SQL tasks. However, many challenges hinder the use of SQL parsers for cross-language tasks. In this article, we introduce FGCSQL, a novel three-stage pipeline framework to deal with three challenges: cross-language schema linking, SQL parsing potential of LLM, and error propagation in SQL parsers, in which the framework uniquely incorporates a Filtering Encoder to eliminate irrelevant database schema items, harnessing a pre-trained Generative Large Language Model fine-tuned on a carefully structured dataset for enhanced SQL parsing. Finally, a Correcting Decoder addresses error propagation, culminating in a robust system for semantic parsing tasks. Tested on the CSpider dataset, the FGCSQL showcases a substantial improvement in Exact-set-Match(EM) accuracy and EXecution accuracy(EX) metrics, validating the pipeline’s architecture’s effectiveness in mitigating the challenges typically confronted in Text-to-SQL conversion, especially in cross-lingual contexts. FGCSQL outstrips existing methods in execution precision, indicating the validity of our proposed method.

Article activity feed