Speech-to-SQL: toward speech-driven SQL query generation from natural language question

The VLDB Journal Pub Date : 2024-02-16 DOI:10.1007/s00778-024-00837-0

Yuanfeng Song, Raymond Chi-Wing Wong, Xuefang Zhao

{"title":"Speech-to-SQL: toward speech-driven SQL query generation from natural language question","authors":"Yuanfeng Song, Raymond Chi-Wing Wong, Xuefang Zhao","doi":"10.1007/s00778-024-00837-0","DOIUrl":null,"url":null,"abstract":"Speech-based inputs have been gaining significant momentum with the popularity of smartphones and tablets in our daily lives, since voice is the most popular and efficient way for human–computer interaction. This paper works toward designing more effective speech-based interfaces to query the structured data in relational databases. We first identify a new task named Speech-to-SQL, which aims to understand the information conveyed by human speech and directly translate it into structured query language (SQL) statements. A naive solution to this problem can work in a cascaded manner, that is, an automatic speech recognition component followed by a text-to-SQL component. However, it requires a high-quality ASR system and also suffers from the error compounding problem between the two components, resulting in limited performance. To handle these challenges, we propose a novel end-to-end neural architecture named SpeechSQLNet to directly translate human speech into SQL queries without an external ASR step. SpeechSQLNet has the advantage of making full use of the rich linguistic information presented in speech. To the best of our knowledge, this is the first attempt to directly synthesize SQL based on common natural language questions in spoken form, rather than a natural language-based version of SQL. To validate the effectiveness of the proposed problem and model, we further construct a dataset named SpeechQL, by piggybacking the widely used text-to-SQL datasets. Extensive experimental evaluations on this dataset show that SpeechSQLNet can directly synthesize high-quality SQL queries from human speech, outperforming various competitive counterparts as well as the cascaded methods in terms of exact match accuracies. We expect speech-to-SQL would inspire more research on more effective and efficient human–machine interfaces to lower the barrier of using relational databases.","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"68 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The VLDB Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00778-024-00837-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Speech-based inputs have been gaining significant momentum with the popularity of smartphones and tablets in our daily lives, since voice is the most popular and efficient way for human–computer interaction. This paper works toward designing more effective speech-based interfaces to query the structured data in relational databases. We first identify a new task named Speech-to-SQL, which aims to understand the information conveyed by human speech and directly translate it into structured query language (SQL) statements. A naive solution to this problem can work in a cascaded manner, that is, an automatic speech recognition component followed by a text-to-SQL component. However, it requires a high-quality ASR system and also suffers from the error compounding problem between the two components, resulting in limited performance. To handle these challenges, we propose a novel end-to-end neural architecture named SpeechSQLNet to directly translate human speech into SQL queries without an external ASR step. SpeechSQLNet has the advantage of making full use of the rich linguistic information presented in speech. To the best of our knowledge, this is the first attempt to directly synthesize SQL based on common natural language questions in spoken form, rather than a natural language-based version of SQL. To validate the effectiveness of the proposed problem and model, we further construct a dataset named SpeechQL, by piggybacking the widely used text-to-SQL datasets. Extensive experimental evaluations on this dataset show that SpeechSQLNet can directly synthesize high-quality SQL queries from human speech, outperforming various competitive counterparts as well as the cascaded methods in terms of exact match accuracies. We expect speech-to-SQL would inspire more research on more effective and efficient human–machine interfaces to lower the barrier of using relational databases.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

语音到 SQL：从自然语言问题到语音驱动的 SQL 查询生成

随着智能手机和平板电脑在我们日常生活中的普及，基于语音的输入已获得了显著的发展势头，因为语音是最流行、最有效的人机交互方式。本文致力于设计更有效的语音界面，以查询关系数据库中的结构化数据。我们首先确定了一项名为 "Speech-to-SQL "的新任务，其目的是理解人类语音所传达的信息，并将其直接转化为结构化查询语言（SQL）语句。解决这一问题的简单方法是采用级联方式，即先使用自动语音识别组件，然后再使用文本到 SQL 组件。但是，这种方法需要高质量的自动语音识别系统，而且两个组件之间存在误差复合问题，导致性能有限。为了应对这些挑战，我们提出了一种名为 SpeechSQLNet 的新型端到端神经架构，无需外部 ASR 步骤即可直接将人类语音翻译成 SQL 查询。SpeechSQLNet 的优势在于能充分利用语音中丰富的语言信息。据我们所知，这是首次尝试根据口语形式的常见自然语言问题直接合成 SQL，而不是基于自然语言版本的 SQL。为了验证所提问题和模型的有效性，我们进一步构建了一个名为 SpeechQL 的数据集，该数据集捎带了广泛使用的文本到 SQL 数据集。在该数据集上进行的广泛实验评估表明，SpeechSQLNet 可以直接从人类语音合成高质量的 SQL 查询，在精确匹配准确率方面优于各种竞争性同类产品以及级联方法。我们希望语音到 SQL 能够激发更多关于更有效和高效的人机界面的研究，从而降低使用关系数据库的门槛。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The VLDB Journal

自引率

0.00%

发文量