SEA-SQL：语义增强型文本到 SQL 自适应细化

arXiv - CS - Databases Pub Date : 2024-08-09 DOI:arxiv-2408.04919

Chaofan Li, Yingxia Shao, Zheng Liu

{"title":"SEA-SQL：语义增强型文本到 SQL 自适应细化","authors":"Chaofan Li, Yingxia Shao, Zheng Liu","doi":"arxiv-2408.04919","DOIUrl":null,"url":null,"abstract":"Recent advancements in large language models (LLMs) have significantly\ncontributed to the progress of the Text-to-SQL task. A common requirement in\nmany of these works is the post-correction of SQL queries. However, the\nmajority of this process entails analyzing error cases to develop prompts with\nrules that eliminate model bias. And there is an absence of execution\nverification for SQL queries. In addition, the prevalent techniques primarily\ndepend on GPT-4 and few-shot prompts, resulting in expensive costs. To\ninvestigate the effective methods for SQL refinement in a cost-efficient\nmanner, we introduce Semantic-Enhanced Text-to-SQL with Adaptive Refinement\n(SEA-SQL), which includes Adaptive Bias Elimination and Dynamic Execution\nAdjustment, aims to improve performance while minimizing resource expenditure\nwith zero-shot prompts. Specifically, SEA-SQL employs a semantic-enhanced\nschema to augment database information and optimize SQL queries. During the SQL\nquery generation, a fine-tuned adaptive bias eliminator is applied to mitigate\ninherent biases caused by the LLM. The dynamic execution adjustment is utilized\nto guarantee the executability of the bias eliminated SQL query. We conduct\nexperiments on the Spider and BIRD datasets to demonstrate the effectiveness of\nthis framework. The results demonstrate that SEA-SQL achieves state-of-the-art\nperformance in the GPT3.5 scenario with 9%-58% of the generation cost.\nFurthermore, SEA-SQL is comparable to GPT-4 with only 0.9%-5.3% of the\ngeneration cost.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"58 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SEA-SQL: Semantic-Enhanced Text-to-SQL with Adaptive Refinement\",\"authors\":\"Chaofan Li, Yingxia Shao, Zheng Liu\",\"doi\":\"arxiv-2408.04919\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advancements in large language models (LLMs) have significantly\\ncontributed to the progress of the Text-to-SQL task. A common requirement in\\nmany of these works is the post-correction of SQL queries. However, the\\nmajority of this process entails analyzing error cases to develop prompts with\\nrules that eliminate model bias. And there is an absence of execution\\nverification for SQL queries. In addition, the prevalent techniques primarily\\ndepend on GPT-4 and few-shot prompts, resulting in expensive costs. To\\ninvestigate the effective methods for SQL refinement in a cost-efficient\\nmanner, we introduce Semantic-Enhanced Text-to-SQL with Adaptive Refinement\\n(SEA-SQL), which includes Adaptive Bias Elimination and Dynamic Execution\\nAdjustment, aims to improve performance while minimizing resource expenditure\\nwith zero-shot prompts. Specifically, SEA-SQL employs a semantic-enhanced\\nschema to augment database information and optimize SQL queries. During the SQL\\nquery generation, a fine-tuned adaptive bias eliminator is applied to mitigate\\ninherent biases caused by the LLM. The dynamic execution adjustment is utilized\\nto guarantee the executability of the bias eliminated SQL query. We conduct\\nexperiments on the Spider and BIRD datasets to demonstrate the effectiveness of\\nthis framework. The results demonstrate that SEA-SQL achieves state-of-the-art\\nperformance in the GPT3.5 scenario with 9%-58% of the generation cost.\\nFurthermore, SEA-SQL is comparable to GPT-4 with only 0.9%-5.3% of the\\ngeneration cost.\",\"PeriodicalId\":501123,\"journal\":{\"name\":\"arXiv - CS - Databases\",\"volume\":\"58 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.04919\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

大型语言模型（LLM）的最新进展极大地推动了文本到 SQL 任务的进展。这些工作中的一个共同要求是对 SQL 查询进行事后纠正。然而，这一过程的大部分工作都需要分析错误案例，以制定具有消除模型偏差的规则的提示。此外，还缺乏对 SQL 查询的执行验证。此外，目前流行的技术主要依赖于 GPT-4 和少量提示，导致成本高昂。为了探索低成本高效率的 SQL 精炼方法，我们引入了语义增强型文本到 SQL 自适应精炼（SEA-SQL），其中包括自适应偏差消除和动态执行调整，旨在提高性能的同时最大限度地减少资源支出，并实现零次提示。具体来说，SEA-SQL 采用语义增强模式来增强数据库信息并优化 SQL 查询。在 SQL 查询生成过程中，应用微调自适应偏差消除器来减轻由 LLM 引起的固有偏差。利用动态执行调整来保证消除了偏差的 SQL 查询的可执行性。我们在 Spider 和 BIRD 数据集上进行了实验，以证明该框架的有效性。结果表明，SEA-SQL在GPT3.5场景下实现了最先进的性能，生成成本降低了9%-58%，而且SEA-SQL与GPT-4相当，生成成本仅降低了0.9%-5.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SEA-SQL: Semantic-Enhanced Text-to-SQL with Adaptive Refinement

Recent advancements in large language models (LLMs) have significantly contributed to the progress of the Text-to-SQL task. A common requirement in many of these works is the post-correction of SQL queries. However, the majority of this process entails analyzing error cases to develop prompts with rules that eliminate model bias. And there is an absence of execution verification for SQL queries. In addition, the prevalent techniques primarily depend on GPT-4 and few-shot prompts, resulting in expensive costs. To investigate the effective methods for SQL refinement in a cost-efficient manner, we introduce Semantic-Enhanced Text-to-SQL with Adaptive Refinement (SEA-SQL), which includes Adaptive Bias Elimination and Dynamic Execution Adjustment, aims to improve performance while minimizing resource expenditure with zero-shot prompts. Specifically, SEA-SQL employs a semantic-enhanced schema to augment database information and optimize SQL queries. During the SQL query generation, a fine-tuned adaptive bias eliminator is applied to mitigate inherent biases caused by the LLM. The dynamic execution adjustment is utilized to guarantee the executability of the bias eliminated SQL query. We conduct experiments on the Spider and BIRD datasets to demonstrate the effectiveness of this framework. The results demonstrate that SEA-SQL achieves state-of-the-art performance in the GPT3.5 scenario with 9%-58% of the generation cost. Furthermore, SEA-SQL is comparable to GPT-4 with only 0.9%-5.3% of the generation cost.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Databases

自引率

0.00%

发文量

期刊最新文献

Development of Data Evaluation Benchmark for Data Wrangling Recommendation System Messy Code Makes Managing ML Pipelines Difficult? Just Let LLMs Rewrite the Code! Fast and Adaptive Bulk Loading of Multidimensional Points Matrix Profile for Anomaly Detection on Multidimensional Time Series Extending predictive process monitoring for collaborative processes