Speech Command Classification System for Sinhala Language based on Automatic Speech Recognition

2019 International Conference on Asian Language Processing (IALP) Pub Date : 2019-11-01 DOI:10.1109/IALP48816.2019.9037648

Thilini Dinushika, Lakshika Kavmini, Pamoda Abeyawardhana, Uthayasanker Thayasivam, Sanath Jayasena

{"title":"Speech Command Classification System for Sinhala Language based on Automatic Speech Recognition","authors":"Thilini Dinushika, Lakshika Kavmini, Pamoda Abeyawardhana, Uthayasanker Thayasivam, Sanath Jayasena","doi":"10.1109/IALP48816.2019.9037648","DOIUrl":null,"url":null,"abstract":"Conversational Artificial Intelligence is revolutionizing the world with its power of converting the conventional computer to a human-like-computer. Exploiting the speaker’s intention is one of the major aspects in the field of conversational Artificial Intelligence. A significant challenge that hinders the effectiveness of identifying the speaker’s intention is the lack of language resources. To address this issue, we present a domain-specific speech command classification system for Sinhala, a low-resourced language. It accomplishes intent detection for the spoken Sinhala language using Automatic Speech Recognition and Natural Language Understanding. The proposed system can be effectively utilized in value-added applications such as Sinhala speech dialog systems. The system consists of an Automatic Speech Recognition engine to convert continuous natural human voice in Sinhala language to its textual representation and a text classifier to accurately understand the user intention. We also present a novel dataset for this task, 4.15 hours of Sinhala speech corpus in the banking domain. Our new Sinhala speech command classification system provides an accuracy of 89.7% in predicting the intent of an utterance. It outperforms the state-of-the-art direct speech-to-intent classification systems developed for the Sinhala language. Moreover, the Automatic Speech Recognition engine shows the Word Error Rate as 12.04% and the Sentence Error Rate as 21.56%. In addition, our experiments provide useful insights on speech-to-intent classification to researchers in low resource spoken language understanding.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP48816.2019.9037648","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Conversational Artificial Intelligence is revolutionizing the world with its power of converting the conventional computer to a human-like-computer. Exploiting the speaker’s intention is one of the major aspects in the field of conversational Artificial Intelligence. A significant challenge that hinders the effectiveness of identifying the speaker’s intention is the lack of language resources. To address this issue, we present a domain-specific speech command classification system for Sinhala, a low-resourced language. It accomplishes intent detection for the spoken Sinhala language using Automatic Speech Recognition and Natural Language Understanding. The proposed system can be effectively utilized in value-added applications such as Sinhala speech dialog systems. The system consists of an Automatic Speech Recognition engine to convert continuous natural human voice in Sinhala language to its textual representation and a text classifier to accurately understand the user intention. We also present a novel dataset for this task, 4.15 hours of Sinhala speech corpus in the banking domain. Our new Sinhala speech command classification system provides an accuracy of 89.7% in predicting the intent of an utterance. It outperforms the state-of-the-art direct speech-to-intent classification systems developed for the Sinhala language. Moreover, the Automatic Speech Recognition engine shows the Word Error Rate as 12.04% and the Sentence Error Rate as 21.56%. In addition, our experiments provide useful insights on speech-to-intent classification to researchers in low resource spoken language understanding.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于自动语音识别的僧伽罗语语音命令分类系统

对话式人工智能凭借其将传统计算机转换为类人计算机的能力正在彻底改变世界。利用说话人的意图是会话人工智能研究的主要方向之一。语言资源的缺乏是阻碍有效识别说话人意图的一个重要挑战。为了解决这个问题，我们提出了一个针对僧伽罗语的特定领域语音命令分类系统。它利用自动语音识别和自然语言理解技术实现了对僧伽罗语的意图检测。该系统可有效地用于诸如僧伽罗语语音对话系统等增值应用。该系统由自动语音识别引擎(Automatic Speech Recognition engine)和文本分类器(text classifier)组成，前者用于将连续的僧伽罗语自然人声转换为文本表示形式，后者用于准确理解用户意图。我们还为这项任务提供了一个新的数据集，即银行领域4.15小时的僧伽罗语语料库。我们的新僧伽罗语语音命令分类系统在预测话语意图方面提供了89.7%的准确率。它优于为僧伽罗语开发的最先进的直接语音到意图分类系统。此外，自动语音识别引擎显示单词错误率为12.04%，句子错误率为21.56%。此外，我们的实验为低资源口语理解的研究人员提供了语音到意图分类的有用见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 International Conference on Asian Language Processing (IALP)

自引率

0.00%

发文量