关系数据库和大型语言模型的混合查询

arXiv - CS - Databases Pub Date : 2024-08-01 DOI:arxiv-2408.00884

Fuheng Zhao, Divyakant Agrawal, Amr El Abbadi

{"title":"关系数据库和大型语言模型的混合查询","authors":"Fuheng Zhao, Divyakant Agrawal, Amr El Abbadi","doi":"arxiv-2408.00884","DOIUrl":null,"url":null,"abstract":"Database queries traditionally operate under the closed-world assumption,\nproviding no answers to questions that require information beyond the data\nstored in the database. Hybrid querying using SQL offers an alternative by\nintegrating relational databases with large language models (LLMs) to answer\nbeyond-database questions. In this paper, we present the first cross-domain\nbenchmark, SWAN, containing 120 beyond-database questions over four real-world\ndatabases. To leverage state-of-the-art language models in addressing these\ncomplex questions in SWAN, we present, HQDL, a preliminary solution for hybrid\nquerying, and also discuss potential future directions. Our evaluation\ndemonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0\\%\nin execution accuracy and 48.2\\% in data factuality. These results highlights\nboth the potential and challenges for hybrid querying. We believe that our work\nwill inspire further research in creating more efficient and accurate data\nsystems that seamlessly integrate relational databases and large language\nmodels to address beyond-database questions.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"50 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hybrid Querying Over Relational Databases and Large Language Models\",\"authors\":\"Fuheng Zhao, Divyakant Agrawal, Amr El Abbadi\",\"doi\":\"arxiv-2408.00884\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Database queries traditionally operate under the closed-world assumption,\\nproviding no answers to questions that require information beyond the data\\nstored in the database. Hybrid querying using SQL offers an alternative by\\nintegrating relational databases with large language models (LLMs) to answer\\nbeyond-database questions. In this paper, we present the first cross-domain\\nbenchmark, SWAN, containing 120 beyond-database questions over four real-world\\ndatabases. To leverage state-of-the-art language models in addressing these\\ncomplex questions in SWAN, we present, HQDL, a preliminary solution for hybrid\\nquerying, and also discuss potential future directions. Our evaluation\\ndemonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0\\\\%\\nin execution accuracy and 48.2\\\\% in data factuality. These results highlights\\nboth the potential and challenges for hybrid querying. We believe that our work\\nwill inspire further research in creating more efficient and accurate data\\nsystems that seamlessly integrate relational databases and large language\\nmodels to address beyond-database questions.\",\"PeriodicalId\":501123,\"journal\":{\"name\":\"arXiv - CS - Databases\",\"volume\":\"50 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.00884\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.00884","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

数据库查询传统上是在封闭世界假设下运行的，无法回答需要数据库数据以外信息的问题。通过将关系数据库与大型语言模型（LLM）相结合来回答数据库之外的问题，使用 SQL 的混合查询提供了另一种选择。在本文中，我们介绍了首个跨领域基准 SWAN，其中包含四个真实世界数据库中的 120 个数据库外问题。为了利用最先进的语言模型解决 SWAN 中的这些复杂问题，我们提出了混合查询的初步解决方案 HQDL，并讨论了潜在的未来发展方向。我们的评估结果表明，HQDL 使用 GPT-4 Turbo 和少量提示，执行准确率达到了 40.0%，数据真实性达到了 48.2%。这些结果凸显了混合查询的潜力和挑战。我们相信，我们的工作将激励进一步的研究，以创建更高效、更准确的数据系统，无缝集成关系数据库和大型语言模型，解决数据库之外的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Hybrid Querying Over Relational Databases and Large Language Models

Database queries traditionally operate under the closed-world assumption, providing no answers to questions that require information beyond the data stored in the database. Hybrid querying using SQL offers an alternative by integrating relational databases with large language models (LLMs) to answer beyond-database questions. In this paper, we present the first cross-domain benchmark, SWAN, containing 120 beyond-database questions over four real-world databases. To leverage state-of-the-art language models in addressing these complex questions in SWAN, we present, HQDL, a preliminary solution for hybrid querying, and also discuss potential future directions. Our evaluation demonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0\% in execution accuracy and 48.2\% in data factuality. These results highlights both the potential and challenges for hybrid querying. We believe that our work will inspire further research in creating more efficient and accurate data systems that seamlessly integrate relational databases and large language models to address beyond-database questions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Databases

自引率

0.00%

发文量

期刊最新文献

Development of Data Evaluation Benchmark for Data Wrangling Recommendation System Messy Code Makes Managing ML Pipelines Difficult? Just Let LLMs Rewrite the Code! Fast and Adaptive Bulk Loading of Multidimensional Points Matrix Profile for Anomaly Detection on Multidimensional Time Series Extending predictive process monitoring for collaborative processes