瞬态噪声条件下语音关键字识别的人工神经网络和动态时间翘曲分析

P. López-Meyer, H. A. C. Maruri, Arturo Quinto-Martinez, Omesh Tickoo
{"title":"瞬态噪声条件下语音关键字识别的人工神经网络和动态时间翘曲分析","authors":"P. López-Meyer, H. A. C. Maruri, Arturo Quinto-Martinez, Omesh Tickoo","doi":"10.1109/ICSENST.2015.7438406","DOIUrl":null,"url":null,"abstract":"Spoken keyword recognition has been under the spotlight for the past several decades, but has gained significant attention in recent years due to the rapid increase in front-end technology applications for mobile and wearable computing. This work presents the trade-off in performance between Artificial Neural Networks (ANN) and Dynamic Time Warping (DTW) methodologies, implemented for this task under three different transient noise conditions (inside a car, in a pub, and outdoors), where no external noise reduction pre-processing is used. For this purpose, two types of recognition models were implemented: speaker dependent (SD) and speaker independent (SI). Experimental results show comparable high keyword recognition precision in SD models for both ANN and DTW using baseline data, i.e. no transient noise, but for the SI models, a significant drop in precision was observed for the case of DTW. Additional precision analyses present the results on how the different types of transient noise affect the recognition methodologies of interest. From the point of view of storage resources, both methodologies require comparable memory usage for the SD models, however, the SI model increases the memory needed with the DTW methodology. Lastly, time performance analysis showed a faster recognition time using the ANN methodology.","PeriodicalId":375376,"journal":{"name":"2015 9th International Conference on Sensing Technology (ICST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analyzing Artificial Neural Networks and Dynamic Time Warping for spoken keyword recognition under transient noise conditions\",\"authors\":\"P. López-Meyer, H. A. C. Maruri, Arturo Quinto-Martinez, Omesh Tickoo\",\"doi\":\"10.1109/ICSENST.2015.7438406\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spoken keyword recognition has been under the spotlight for the past several decades, but has gained significant attention in recent years due to the rapid increase in front-end technology applications for mobile and wearable computing. This work presents the trade-off in performance between Artificial Neural Networks (ANN) and Dynamic Time Warping (DTW) methodologies, implemented for this task under three different transient noise conditions (inside a car, in a pub, and outdoors), where no external noise reduction pre-processing is used. For this purpose, two types of recognition models were implemented: speaker dependent (SD) and speaker independent (SI). Experimental results show comparable high keyword recognition precision in SD models for both ANN and DTW using baseline data, i.e. no transient noise, but for the SI models, a significant drop in precision was observed for the case of DTW. Additional precision analyses present the results on how the different types of transient noise affect the recognition methodologies of interest. From the point of view of storage resources, both methodologies require comparable memory usage for the SD models, however, the SI model increases the memory needed with the DTW methodology. Lastly, time performance analysis showed a faster recognition time using the ANN methodology.\",\"PeriodicalId\":375376,\"journal\":{\"name\":\"2015 9th International Conference on Sensing Technology (ICST)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 9th International Conference on Sensing Technology (ICST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSENST.2015.7438406\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 9th International Conference on Sensing Technology (ICST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSENST.2015.7438406","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在过去的几十年里,语音关键字识别一直是人们关注的焦点,但近年来,由于移动和可穿戴计算的前端技术应用迅速增加,语音关键字识别得到了极大的关注。这项工作展示了人工神经网络(ANN)和动态时间翘曲(DTW)方法之间的性能权衡,在三种不同的瞬态噪声条件下(车内、酒吧和户外)实现了这项任务,其中没有使用外部降噪预处理。为此,实现了两种类型的识别模型:说话人依赖(SD)和说话人独立(SI)。实验结果表明,使用基线数据,ANN和DTW的SD模型的关键字识别精度相当高,即没有瞬态噪声,但对于SI模型,DTW的精度明显下降。额外的精度分析给出了不同类型的瞬态噪声如何影响感兴趣的识别方法的结果。从存储资源的角度来看,两种方法都需要SD模型的内存使用,但是,SI模型增加了DTW方法所需的内存。最后,时间性能分析表明,使用人工神经网络方法可以提高识别速度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Analyzing Artificial Neural Networks and Dynamic Time Warping for spoken keyword recognition under transient noise conditions
Spoken keyword recognition has been under the spotlight for the past several decades, but has gained significant attention in recent years due to the rapid increase in front-end technology applications for mobile and wearable computing. This work presents the trade-off in performance between Artificial Neural Networks (ANN) and Dynamic Time Warping (DTW) methodologies, implemented for this task under three different transient noise conditions (inside a car, in a pub, and outdoors), where no external noise reduction pre-processing is used. For this purpose, two types of recognition models were implemented: speaker dependent (SD) and speaker independent (SI). Experimental results show comparable high keyword recognition precision in SD models for both ANN and DTW using baseline data, i.e. no transient noise, but for the SI models, a significant drop in precision was observed for the case of DTW. Additional precision analyses present the results on how the different types of transient noise affect the recognition methodologies of interest. From the point of view of storage resources, both methodologies require comparable memory usage for the SD models, however, the SI model increases the memory needed with the DTW methodology. Lastly, time performance analysis showed a faster recognition time using the ANN methodology.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The development and evaluation of an arm usage coach for Stroke survivors Uncertainty analysis of a vibrating-wire system for magnetic axes localization Magnetic field shaping for improved 1-D linear position measurement Real-time detection of residual antibiotics concentration with microwave cavity and planar EM sensors Ambient temperature effect on Amorphous Silicon (A-Si) Photovoltaic module using sensing technology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1