{"title":"基于深度学习的卷积递归神经网络声音事件定位与检测算法","authors":"Hongxia Zhu, Jun Yan","doi":"10.1109/cits55221.2022.9832991","DOIUrl":null,"url":null,"abstract":"With the application of sound event detection in more and more fields, an accurate sound event location and detection system has attracted wide attention. In this paper, we propose a sound event location and detection algorithm based on convolutional recurrent neural network (CRNN). In the offline phase, complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) algorithm is used to remove the noise of unknown distribution of the collected data set. Then, we extract filter banks (FBANK) features and cross correlation (GCC) features of each channel and fuse them. Finally, the features are input to CRNN which combined with soft attention mechanism to train the model. The CRNN is a multi-task learning framework. For sound category and sound location, it is realized by classification task and regression task respectively. Experimental results show that the algorithm is effective and can provide accurate category estimation and location estimation.","PeriodicalId":136239,"journal":{"name":"2022 International Conference on Computer, Information and Telecommunication Systems (CITS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Deep Learning Based Sound Event Location and Detection Algorithm Using Convolutional Recurrent Neural Network\",\"authors\":\"Hongxia Zhu, Jun Yan\",\"doi\":\"10.1109/cits55221.2022.9832991\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the application of sound event detection in more and more fields, an accurate sound event location and detection system has attracted wide attention. In this paper, we propose a sound event location and detection algorithm based on convolutional recurrent neural network (CRNN). In the offline phase, complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) algorithm is used to remove the noise of unknown distribution of the collected data set. Then, we extract filter banks (FBANK) features and cross correlation (GCC) features of each channel and fuse them. Finally, the features are input to CRNN which combined with soft attention mechanism to train the model. The CRNN is a multi-task learning framework. For sound category and sound location, it is realized by classification task and regression task respectively. Experimental results show that the algorithm is effective and can provide accurate category estimation and location estimation.\",\"PeriodicalId\":136239,\"journal\":{\"name\":\"2022 International Conference on Computer, Information and Telecommunication Systems (CITS)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Computer, Information and Telecommunication Systems (CITS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/cits55221.2022.9832991\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Computer, Information and Telecommunication Systems (CITS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cits55221.2022.9832991","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Deep Learning Based Sound Event Location and Detection Algorithm Using Convolutional Recurrent Neural Network
With the application of sound event detection in more and more fields, an accurate sound event location and detection system has attracted wide attention. In this paper, we propose a sound event location and detection algorithm based on convolutional recurrent neural network (CRNN). In the offline phase, complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) algorithm is used to remove the noise of unknown distribution of the collected data set. Then, we extract filter banks (FBANK) features and cross correlation (GCC) features of each channel and fuse them. Finally, the features are input to CRNN which combined with soft attention mechanism to train the model. The CRNN is a multi-task learning framework. For sound category and sound location, it is realized by classification task and regression task respectively. Experimental results show that the algorithm is effective and can provide accurate category estimation and location estimation.