关系数据库和大型语言模型的混合查询

Fuheng Zhao, Divyakant Agrawal, Amr El Abbadi
{"title":"关系数据库和大型语言模型的混合查询","authors":"Fuheng Zhao, Divyakant Agrawal, Amr El Abbadi","doi":"arxiv-2408.00884","DOIUrl":null,"url":null,"abstract":"Database queries traditionally operate under the closed-world assumption,\nproviding no answers to questions that require information beyond the data\nstored in the database. Hybrid querying using SQL offers an alternative by\nintegrating relational databases with large language models (LLMs) to answer\nbeyond-database questions. In this paper, we present the first cross-domain\nbenchmark, SWAN, containing 120 beyond-database questions over four real-world\ndatabases. To leverage state-of-the-art language models in addressing these\ncomplex questions in SWAN, we present, HQDL, a preliminary solution for hybrid\nquerying, and also discuss potential future directions. Our evaluation\ndemonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0\\%\nin execution accuracy and 48.2\\% in data factuality. These results highlights\nboth the potential and challenges for hybrid querying. We believe that our work\nwill inspire further research in creating more efficient and accurate data\nsystems that seamlessly integrate relational databases and large language\nmodels to address beyond-database questions.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hybrid Querying Over Relational Databases and Large Language Models\",\"authors\":\"Fuheng Zhao, Divyakant Agrawal, Amr El Abbadi\",\"doi\":\"arxiv-2408.00884\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Database queries traditionally operate under the closed-world assumption,\\nproviding no answers to questions that require information beyond the data\\nstored in the database. Hybrid querying using SQL offers an alternative by\\nintegrating relational databases with large language models (LLMs) to answer\\nbeyond-database questions. In this paper, we present the first cross-domain\\nbenchmark, SWAN, containing 120 beyond-database questions over four real-world\\ndatabases. To leverage state-of-the-art language models in addressing these\\ncomplex questions in SWAN, we present, HQDL, a preliminary solution for hybrid\\nquerying, and also discuss potential future directions. Our evaluation\\ndemonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0\\\\%\\nin execution accuracy and 48.2\\\\% in data factuality. These results highlights\\nboth the potential and challenges for hybrid querying. We believe that our work\\nwill inspire further research in creating more efficient and accurate data\\nsystems that seamlessly integrate relational databases and large language\\nmodels to address beyond-database questions.\",\"PeriodicalId\":501123,\"journal\":{\"name\":\"arXiv - CS - Databases\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.00884\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.00884","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

数据库查询传统上是在封闭世界假设下运行的,无法回答需要数据库数据以外信息的问题。通过将关系数据库与大型语言模型(LLM)相结合来回答数据库之外的问题,使用 SQL 的混合查询提供了另一种选择。在本文中,我们介绍了首个跨领域基准 SWAN,其中包含四个真实世界数据库中的 120 个数据库外问题。为了利用最先进的语言模型解决 SWAN 中的这些复杂问题,我们提出了混合查询的初步解决方案 HQDL,并讨论了潜在的未来发展方向。我们的评估结果表明,HQDL 使用 GPT-4 Turbo 和少量提示,执行准确率达到了 40.0%,数据真实性达到了 48.2%。这些结果凸显了混合查询的潜力和挑战。我们相信,我们的工作将激励进一步的研究,以创建更高效、更准确的数据系统,无缝集成关系数据库和大型语言模型,解决数据库之外的问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Hybrid Querying Over Relational Databases and Large Language Models
Database queries traditionally operate under the closed-world assumption, providing no answers to questions that require information beyond the data stored in the database. Hybrid querying using SQL offers an alternative by integrating relational databases with large language models (LLMs) to answer beyond-database questions. In this paper, we present the first cross-domain benchmark, SWAN, containing 120 beyond-database questions over four real-world databases. To leverage state-of-the-art language models in addressing these complex questions in SWAN, we present, HQDL, a preliminary solution for hybrid querying, and also discuss potential future directions. Our evaluation demonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0\% in execution accuracy and 48.2\% in data factuality. These results highlights both the potential and challenges for hybrid querying. We believe that our work will inspire further research in creating more efficient and accurate data systems that seamlessly integrate relational databases and large language models to address beyond-database questions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Development of Data Evaluation Benchmark for Data Wrangling Recommendation System Messy Code Makes Managing ML Pipelines Difficult? Just Let LLMs Rewrite the Code! Fast and Adaptive Bulk Loading of Multidimensional Points Matrix Profile for Anomaly Detection on Multidimensional Time Series Extending predictive process monitoring for collaborative processes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1