Speech-to-SQL: toward speech-driven SQL query generation from natural language question

Yuanfeng Song, Raymond Chi-Wing Wong, Xuefang Zhao
{"title":"Speech-to-SQL: toward speech-driven SQL query generation from natural language question","authors":"Yuanfeng Song, Raymond Chi-Wing Wong, Xuefang Zhao","doi":"10.1007/s00778-024-00837-0","DOIUrl":null,"url":null,"abstract":"<p>Speech-based inputs have been gaining significant momentum with the popularity of smartphones and tablets in our daily lives, since voice is the most popular and efficient way for human–computer interaction. This paper works toward designing more effective speech-based interfaces to query the structured data in relational databases. We first identify a new task named <i>Speech-to-SQL</i>, which aims to understand the information conveyed by human speech and directly translate it into structured query language (SQL) statements. A naive solution to this problem can work in a cascaded manner, that is, an automatic speech recognition component followed by a text-to-SQL component. However, it requires a high-quality ASR system and also suffers from the error compounding problem between the two components, resulting in limited performance. To handle these challenges, we propose a novel end-to-end neural architecture named <i>SpeechSQLNet</i> to directly translate human speech into SQL queries without an external ASR step. SpeechSQLNet has the advantage of making full use of the rich linguistic information presented in speech. To the best of our knowledge, this is the first attempt to directly synthesize SQL based on common natural language questions in spoken form, rather than a natural language-based version of SQL. To validate the effectiveness of the proposed problem and model, we further construct a dataset named <i>SpeechQL</i>, by piggybacking the widely used text-to-SQL datasets. Extensive experimental evaluations on this dataset show that SpeechSQLNet can directly synthesize high-quality SQL queries from human speech, outperforming various competitive counterparts as well as the cascaded methods in terms of exact match accuracies. We expect speech-to-SQL would inspire more research on more effective and efficient human–machine interfaces to lower the barrier of using relational databases.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"68 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The VLDB Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00778-024-00837-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Speech-based inputs have been gaining significant momentum with the popularity of smartphones and tablets in our daily lives, since voice is the most popular and efficient way for human–computer interaction. This paper works toward designing more effective speech-based interfaces to query the structured data in relational databases. We first identify a new task named Speech-to-SQL, which aims to understand the information conveyed by human speech and directly translate it into structured query language (SQL) statements. A naive solution to this problem can work in a cascaded manner, that is, an automatic speech recognition component followed by a text-to-SQL component. However, it requires a high-quality ASR system and also suffers from the error compounding problem between the two components, resulting in limited performance. To handle these challenges, we propose a novel end-to-end neural architecture named SpeechSQLNet to directly translate human speech into SQL queries without an external ASR step. SpeechSQLNet has the advantage of making full use of the rich linguistic information presented in speech. To the best of our knowledge, this is the first attempt to directly synthesize SQL based on common natural language questions in spoken form, rather than a natural language-based version of SQL. To validate the effectiveness of the proposed problem and model, we further construct a dataset named SpeechQL, by piggybacking the widely used text-to-SQL datasets. Extensive experimental evaluations on this dataset show that SpeechSQLNet can directly synthesize high-quality SQL queries from human speech, outperforming various competitive counterparts as well as the cascaded methods in terms of exact match accuracies. We expect speech-to-SQL would inspire more research on more effective and efficient human–machine interfaces to lower the barrier of using relational databases.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
语音到 SQL:从自然语言问题到语音驱动的 SQL 查询生成
随着智能手机和平板电脑在我们日常生活中的普及,基于语音的输入已获得了显著的发展势头,因为语音是最流行、最有效的人机交互方式。本文致力于设计更有效的语音界面,以查询关系数据库中的结构化数据。我们首先确定了一项名为 "Speech-to-SQL "的新任务,其目的是理解人类语音所传达的信息,并将其直接转化为结构化查询语言(SQL)语句。解决这一问题的简单方法是采用级联方式,即先使用自动语音识别组件,然后再使用文本到 SQL 组件。但是,这种方法需要高质量的自动语音识别系统,而且两个组件之间存在误差复合问题,导致性能有限。为了应对这些挑战,我们提出了一种名为 SpeechSQLNet 的新型端到端神经架构,无需外部 ASR 步骤即可直接将人类语音翻译成 SQL 查询。SpeechSQLNet 的优势在于能充分利用语音中丰富的语言信息。据我们所知,这是首次尝试根据口语形式的常见自然语言问题直接合成 SQL,而不是基于自然语言版本的 SQL。为了验证所提问题和模型的有效性,我们进一步构建了一个名为 SpeechQL 的数据集,该数据集捎带了广泛使用的文本到 SQL 数据集。在该数据集上进行的广泛实验评估表明,SpeechSQLNet 可以直接从人类语音合成高质量的 SQL 查询,在精确匹配准确率方面优于各种竞争性同类产品以及级联方法。我们希望语音到 SQL 能够激发更多关于更有效和高效的人机界面的研究,从而降低使用关系数据库的门槛。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A versatile framework for attributed network clustering via K-nearest neighbor augmentation Discovering critical vertices for reinforcement of large-scale bipartite networks DumpyOS: A data-adaptive multi-ary index for scalable data series similarity search Enabling space-time efficient range queries with REncoder AutoCTS++: zero-shot joint neural architecture and hyperparameter search for correlated time series forecasting
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1