P. López-Meyer, H. A. C. Maruri, Arturo Quinto-Martinez, Omesh Tickoo
{"title":"瞬态噪声条件下语音关键字识别的人工神经网络和动态时间翘曲分析","authors":"P. López-Meyer, H. A. C. Maruri, Arturo Quinto-Martinez, Omesh Tickoo","doi":"10.1109/ICSENST.2015.7438406","DOIUrl":null,"url":null,"abstract":"Spoken keyword recognition has been under the spotlight for the past several decades, but has gained significant attention in recent years due to the rapid increase in front-end technology applications for mobile and wearable computing. This work presents the trade-off in performance between Artificial Neural Networks (ANN) and Dynamic Time Warping (DTW) methodologies, implemented for this task under three different transient noise conditions (inside a car, in a pub, and outdoors), where no external noise reduction pre-processing is used. For this purpose, two types of recognition models were implemented: speaker dependent (SD) and speaker independent (SI). Experimental results show comparable high keyword recognition precision in SD models for both ANN and DTW using baseline data, i.e. no transient noise, but for the SI models, a significant drop in precision was observed for the case of DTW. Additional precision analyses present the results on how the different types of transient noise affect the recognition methodologies of interest. From the point of view of storage resources, both methodologies require comparable memory usage for the SD models, however, the SI model increases the memory needed with the DTW methodology. Lastly, time performance analysis showed a faster recognition time using the ANN methodology.","PeriodicalId":375376,"journal":{"name":"2015 9th International Conference on Sensing Technology (ICST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analyzing Artificial Neural Networks and Dynamic Time Warping for spoken keyword recognition under transient noise conditions\",\"authors\":\"P. López-Meyer, H. A. C. Maruri, Arturo Quinto-Martinez, Omesh Tickoo\",\"doi\":\"10.1109/ICSENST.2015.7438406\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spoken keyword recognition has been under the spotlight for the past several decades, but has gained significant attention in recent years due to the rapid increase in front-end technology applications for mobile and wearable computing. This work presents the trade-off in performance between Artificial Neural Networks (ANN) and Dynamic Time Warping (DTW) methodologies, implemented for this task under three different transient noise conditions (inside a car, in a pub, and outdoors), where no external noise reduction pre-processing is used. For this purpose, two types of recognition models were implemented: speaker dependent (SD) and speaker independent (SI). Experimental results show comparable high keyword recognition precision in SD models for both ANN and DTW using baseline data, i.e. no transient noise, but for the SI models, a significant drop in precision was observed for the case of DTW. Additional precision analyses present the results on how the different types of transient noise affect the recognition methodologies of interest. From the point of view of storage resources, both methodologies require comparable memory usage for the SD models, however, the SI model increases the memory needed with the DTW methodology. Lastly, time performance analysis showed a faster recognition time using the ANN methodology.\",\"PeriodicalId\":375376,\"journal\":{\"name\":\"2015 9th International Conference on Sensing Technology (ICST)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 9th International Conference on Sensing Technology (ICST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSENST.2015.7438406\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 9th International Conference on Sensing Technology (ICST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSENST.2015.7438406","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analyzing Artificial Neural Networks and Dynamic Time Warping for spoken keyword recognition under transient noise conditions
Spoken keyword recognition has been under the spotlight for the past several decades, but has gained significant attention in recent years due to the rapid increase in front-end technology applications for mobile and wearable computing. This work presents the trade-off in performance between Artificial Neural Networks (ANN) and Dynamic Time Warping (DTW) methodologies, implemented for this task under three different transient noise conditions (inside a car, in a pub, and outdoors), where no external noise reduction pre-processing is used. For this purpose, two types of recognition models were implemented: speaker dependent (SD) and speaker independent (SI). Experimental results show comparable high keyword recognition precision in SD models for both ANN and DTW using baseline data, i.e. no transient noise, but for the SI models, a significant drop in precision was observed for the case of DTW. Additional precision analyses present the results on how the different types of transient noise affect the recognition methodologies of interest. From the point of view of storage resources, both methodologies require comparable memory usage for the SD models, however, the SI model increases the memory needed with the DTW methodology. Lastly, time performance analysis showed a faster recognition time using the ANN methodology.