Evaluating target utterance identification method using practical free conversation

2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET) Pub Date : 2020-09-26 DOI:10.1109/IICAIET49801.2020.9257852

Naoto Kosaka, Yumi Wakita

{"title":"Evaluating target utterance identification method using practical free conversation","authors":"Naoto Kosaka, Yumi Wakita","doi":"10.1109/IICAIET49801.2020.9257852","DOIUrl":null,"url":null,"abstract":"We develop a conversation support system for the public community. Our concept is that supporting elderly person's active life by assisting human-to-human conversation is more effective than providing a speech dialogue system. To use a conversation support system in an actual restaurant or lounge environment, it is necessary to separate the conversation of the target near the microphone from the ambient noise. We have already proposed the identification method of the utterances spoken between near a microphone and far from it using the standard deviation values of the fundamental frequency (SD-F0) and those of the speech power level (SD-SP) for each utterance. In the paper, we evaluate the effectiveness of our identification method for an actual free conversation using Support Vector Machine(SVM) method. As a result, the precision rate of the utterances near the microphone is 87.8%. This means that the identification method using the standard deviations of the fundamental frequency and speech power would be effective even if they are used in real environments. However, the performance depends on the utterances lengths, the F0 value's stability of the utterance part of over the threshold and the position of the microphones. In future, it evaluation should be done using more number of speakers and variable situations to define a suitable system specification.","PeriodicalId":300885,"journal":{"name":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IICAIET49801.2020.9257852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We develop a conversation support system for the public community. Our concept is that supporting elderly person's active life by assisting human-to-human conversation is more effective than providing a speech dialogue system. To use a conversation support system in an actual restaurant or lounge environment, it is necessary to separate the conversation of the target near the microphone from the ambient noise. We have already proposed the identification method of the utterances spoken between near a microphone and far from it using the standard deviation values of the fundamental frequency (SD-F0) and those of the speech power level (SD-SP) for each utterance. In the paper, we evaluate the effectiveness of our identification method for an actual free conversation using Support Vector Machine(SVM) method. As a result, the precision rate of the utterances near the microphone is 87.8%. This means that the identification method using the standard deviations of the fundamental frequency and speech power would be effective even if they are used in real environments. However, the performance depends on the utterances lengths, the F0 value's stability of the utterance part of over the threshold and the position of the microphones. In future, it evaluation should be done using more number of speakers and variable situations to define a suitable system specification.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用实际自由会话评价目标话语识别方法

我们为公共社区开发了一个对话支持系统。我们的理念是，通过协助人与人之间的对话来支持老年人的积极生活，比提供语音对话系统更有效。要在实际的餐厅或休息室环境中使用对话支持系统，有必要将麦克风附近目标的对话与环境噪声分开。我们已经提出了利用每个话语的基频(SD-F0)和语音功率电平(SD-SP)的标准差值对近麦克风和远麦克风之间的话语进行识别的方法。在本文中，我们使用支持向量机(SVM)方法评估了我们的识别方法对实际自由对话的有效性。结果表明，在麦克风附近的话语的准确率为87.8%。这意味着，即使在真实环境中使用，利用基频和语音功率的标准差进行识别的方法也是有效的。然而，性能取决于话语长度、超过阈值的话语部分F0值的稳定性以及麦克风的位置。将来，它的评估应该使用更多的扬声器和可变的情况来定义一个合适的系统规范。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)

自引率

0.00%

发文量