Combining computational linguistics with sentence embedding to create a zero-shot NLIDB

IF 2.3 Q2 COMPUTER SCIENCE, THEORY & METHODS Array Pub Date : 2024-10-24 DOI:10.1016/j.array.2024.100368
Yuriy Perezhohin , Fernando Peres , Mauro Castelli
{"title":"Combining computational linguistics with sentence embedding to create a zero-shot NLIDB","authors":"Yuriy Perezhohin ,&nbsp;Fernando Peres ,&nbsp;Mauro Castelli","doi":"10.1016/j.array.2024.100368","DOIUrl":null,"url":null,"abstract":"<div><div>Accessing relational databases using natural language is a challenging task, with existing methods often suffering from poor domain generalization and high computational costs. In this study, we propose a novel approach that eliminates the training phase while offering high adaptability across domains. Our method combines structured linguistic rules, a curated vocabulary, and pre-trained embedding models to accurately translate natural language queries into SQL. Experimental results on the SPIDER benchmark demonstrate the effectiveness of our approach, with execution accuracy rates of 72.03% on the training set and 70.83% on the development set, while maintaining domain flexibility. Furthermore, the proposed system outperformed two extensively trained models by up to 28.33% on the development set, demonstrating its efficiency. This research presents a significant advancement in zero-shot Natural Language Interfaces for Databases (NLIDBs), providing a resource-efficient alternative for generating accurate SQL queries from plain language inputs.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"24 ","pages":"Article 100368"},"PeriodicalIF":2.3000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005624000341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Accessing relational databases using natural language is a challenging task, with existing methods often suffering from poor domain generalization and high computational costs. In this study, we propose a novel approach that eliminates the training phase while offering high adaptability across domains. Our method combines structured linguistic rules, a curated vocabulary, and pre-trained embedding models to accurately translate natural language queries into SQL. Experimental results on the SPIDER benchmark demonstrate the effectiveness of our approach, with execution accuracy rates of 72.03% on the training set and 70.83% on the development set, while maintaining domain flexibility. Furthermore, the proposed system outperformed two extensively trained models by up to 28.33% on the development set, demonstrating its efficiency. This research presents a significant advancement in zero-shot Natural Language Interfaces for Databases (NLIDBs), providing a resource-efficient alternative for generating accurate SQL queries from plain language inputs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
将计算语言学与句子嵌入相结合,创建零镜头 NLIDB
使用自然语言访问关系数据库是一项极具挑战性的任务,现有的方法往往存在领域通用性差和计算成本高等问题。在本研究中,我们提出了一种新颖的方法,它省去了训练阶段,同时提供了跨领域的高适应性。我们的方法结合了结构化语言规则、精心策划的词汇表和预训练的嵌入模型,可将自然语言查询准确地翻译成 SQL。SPIDER 基准的实验结果证明了我们方法的有效性,在保持领域灵活性的同时,训练集的执行准确率为 72.03%,开发集的执行准确率为 70.83%。此外,所提出的系统在开发集上的表现比两个经过广泛训练的模型高出 28.33%,证明了它的高效性。这项研究极大地推动了数据库自然语言接口(NLIDB)的发展,为从普通语言输入生成准确的 SQL 查询提供了一种资源节约型替代方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Array
Array Computer Science-General Computer Science
CiteScore
4.40
自引率
0.00%
发文量
93
审稿时长
45 days
期刊最新文献
SAMU-Net: A dual-stage polyp segmentation network with a custom attention-based U-Net and segment anything model for enhanced mask prediction Combining computational linguistics with sentence embedding to create a zero-shot NLIDB Development of automatic CNC machine with versatile applications in art, design, and engineering Dual-model approach for one-shot lithium-ion battery state of health sequence prediction Maximizing influence via link prediction in evolving networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1