瞬态噪声条件下语音关键字识别的人工神经网络和动态时间翘曲分析

2015 9th International Conference on Sensing Technology (ICST) Pub Date : 2015-12-01 DOI:10.1109/ICSENST.2015.7438406

P. López-Meyer, H. A. C. Maruri, Arturo Quinto-Martinez, Omesh Tickoo

{"title":"瞬态噪声条件下语音关键字识别的人工神经网络和动态时间翘曲分析","authors":"P. López-Meyer, H. A. C. Maruri, Arturo Quinto-Martinez, Omesh Tickoo","doi":"10.1109/ICSENST.2015.7438406","DOIUrl":null,"url":null,"abstract":"Spoken keyword recognition has been under the spotlight for the past several decades, but has gained significant attention in recent years due to the rapid increase in front-end technology applications for mobile and wearable computing. This work presents the trade-off in performance between Artificial Neural Networks (ANN) and Dynamic Time Warping (DTW) methodologies, implemented for this task under three different transient noise conditions (inside a car, in a pub, and outdoors), where no external noise reduction pre-processing is used. For this purpose, two types of recognition models were implemented: speaker dependent (SD) and speaker independent (SI). Experimental results show comparable high keyword recognition precision in SD models for both ANN and DTW using baseline data, i.e. no transient noise, but for the SI models, a significant drop in precision was observed for the case of DTW. Additional precision analyses present the results on how the different types of transient noise affect the recognition methodologies of interest. From the point of view of storage resources, both methodologies require comparable memory usage for the SD models, however, the SI model increases the memory needed with the DTW methodology. Lastly, time performance analysis showed a faster recognition time using the ANN methodology.","PeriodicalId":375376,"journal":{"name":"2015 9th International Conference on Sensing Technology (ICST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analyzing Artificial Neural Networks and Dynamic Time Warping for spoken keyword recognition under transient noise conditions\",\"authors\":\"P. López-Meyer, H. A. C. Maruri, Arturo Quinto-Martinez, Omesh Tickoo\",\"doi\":\"10.1109/ICSENST.2015.7438406\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spoken keyword recognition has been under the spotlight for the past several decades, but has gained significant attention in recent years due to the rapid increase in front-end technology applications for mobile and wearable computing. This work presents the trade-off in performance between Artificial Neural Networks (ANN) and Dynamic Time Warping (DTW) methodologies, implemented for this task under three different transient noise conditions (inside a car, in a pub, and outdoors), where no external noise reduction pre-processing is used. For this purpose, two types of recognition models were implemented: speaker dependent (SD) and speaker independent (SI). Experimental results show comparable high keyword recognition precision in SD models for both ANN and DTW using baseline data, i.e. no transient noise, but for the SI models, a significant drop in precision was observed for the case of DTW. Additional precision analyses present the results on how the different types of transient noise affect the recognition methodologies of interest. From the point of view of storage resources, both methodologies require comparable memory usage for the SD models, however, the SI model increases the memory needed with the DTW methodology. Lastly, time performance analysis showed a faster recognition time using the ANN methodology.\",\"PeriodicalId\":375376,\"journal\":{\"name\":\"2015 9th International Conference on Sensing Technology (ICST)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 9th International Conference on Sensing Technology (ICST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSENST.2015.7438406\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 9th International Conference on Sensing Technology (ICST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSENST.2015.7438406","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在过去的几十年里，语音关键字识别一直是人们关注的焦点，但近年来，由于移动和可穿戴计算的前端技术应用迅速增加，语音关键字识别得到了极大的关注。这项工作展示了人工神经网络(ANN)和动态时间翘曲(DTW)方法之间的性能权衡，在三种不同的瞬态噪声条件下(车内、酒吧和户外)实现了这项任务，其中没有使用外部降噪预处理。为此，实现了两种类型的识别模型:说话人依赖(SD)和说话人独立(SI)。实验结果表明，使用基线数据，ANN和DTW的SD模型的关键字识别精度相当高，即没有瞬态噪声，但对于SI模型，DTW的精度明显下降。额外的精度分析给出了不同类型的瞬态噪声如何影响感兴趣的识别方法的结果。从存储资源的角度来看，两种方法都需要SD模型的内存使用，但是，SI模型增加了DTW方法所需的内存。最后，时间性能分析表明，使用人工神经网络方法可以提高识别速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Analyzing Artificial Neural Networks and Dynamic Time Warping for spoken keyword recognition under transient noise conditions

Spoken keyword recognition has been under the spotlight for the past several decades, but has gained significant attention in recent years due to the rapid increase in front-end technology applications for mobile and wearable computing. This work presents the trade-off in performance between Artificial Neural Networks (ANN) and Dynamic Time Warping (DTW) methodologies, implemented for this task under three different transient noise conditions (inside a car, in a pub, and outdoors), where no external noise reduction pre-processing is used. For this purpose, two types of recognition models were implemented: speaker dependent (SD) and speaker independent (SI). Experimental results show comparable high keyword recognition precision in SD models for both ANN and DTW using baseline data, i.e. no transient noise, but for the SI models, a significant drop in precision was observed for the case of DTW. Additional precision analyses present the results on how the different types of transient noise affect the recognition methodologies of interest. From the point of view of storage resources, both methodologies require comparable memory usage for the SD models, however, the SI model increases the memory needed with the DTW methodology. Lastly, time performance analysis showed a faster recognition time using the ANN methodology.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 9th International Conference on Sensing Technology (ICST)

自引率

0.00%

发文量