Pub Date : 2020-11-01DOI: 10.7776/ASK.2020.39.6.622
Seorim Hwang
{"title":"Chord-based stepwise Korean Trot music generation technique using RNN-GAN","authors":"Seorim Hwang","doi":"10.7776/ASK.2020.39.6.622","DOIUrl":"https://doi.org/10.7776/ASK.2020.39.6.622","url":null,"abstract":"","PeriodicalId":42689,"journal":{"name":"Journal of the Acoustical Society of Korea","volume":"39 1","pages":"622-628"},"PeriodicalIF":0.4,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42242746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.7776/ASK.2020.39.6.524
Kunwoo Kim, Seo-Yoon Ryu, C. Cheong, Seongjin Seo, Cheolmin Jang, Hanshin Seol
: In this study, noise radiated from a high-speed fan-motor unit for a cordless vacuum cleaner is reduced by designing splitter blades on the existing impeller. First of all, in order to investigate the flow field through a fan-motor unit, especially impeller, the unsteady incompressible Reynolds-Averaged Navier-Stokes (RANS) equations are numerically solved by using computational fluid dynamic technique. With predicted flow field results as input, the Ffowcs Williams-Hawkings (FW-H) integral equation is solved to predict aerodynamic noise radiated from the impeller. The validity of the numerical methods is confirmed by comparing the predicted sound pressure spectrum with the measured one. Further analysis of the predicted flow field shows that the strong vortex is formed between the impeller blades. As the vortex induces the loss of the flow field and acts as an aerodynamic noise source, supplementary splitter blades are designed to the existing impeller to suppress the identified vortex. The length and position of splitter are selected as design factors and the effect of each design factor on aerodynamic noise is numerically analyzed by using the Taguchi method. From this results, the optimum location and length of splitter for minimum radiated noise is determined. The finally selected design shows lower noise than the existing one.
在本研究中,通过在现有的叶轮上设计分流叶片,降低了高速无绳吸尘器风扇电机单元的噪声。首先,利用计算流体动力学技术对非定常不可压缩雷诺-平均纳维-斯托克斯(RANS)方程进行数值求解,研究了风机-电机单元特别是叶轮内部的流场。以预测流场结果为输入,求解Ffowcs williams - hawkins (FW-H)积分方程,预测叶轮辐射的气动噪声。通过与实测声压谱的比较,验证了数值方法的有效性。对预测流场的进一步分析表明,叶轮叶片之间形成了强涡。由于涡流会引起流场损失并成为气动噪声源,因此在现有叶轮上设计补充分流叶片来抑制识别出的涡流。选取分离器的长度和位置作为设计因素,采用田口法数值分析了各设计因素对气动噪声的影响。根据实验结果,确定了最小辐射噪声条件下分路器的最佳位置和长度。最终选择的设计比现有的设计具有更低的噪声。
{"title":"Aerodynamic noise reduction of fan motor unit of cordless vacuum cleaner by optimal designing of splitter blades for impeller","authors":"Kunwoo Kim, Seo-Yoon Ryu, C. Cheong, Seongjin Seo, Cheolmin Jang, Hanshin Seol","doi":"10.7776/ASK.2020.39.6.524","DOIUrl":"https://doi.org/10.7776/ASK.2020.39.6.524","url":null,"abstract":": In this study, noise radiated from a high-speed fan-motor unit for a cordless vacuum cleaner is reduced by designing splitter blades on the existing impeller. First of all, in order to investigate the flow field through a fan-motor unit, especially impeller, the unsteady incompressible Reynolds-Averaged Navier-Stokes (RANS) equations are numerically solved by using computational fluid dynamic technique. With predicted flow field results as input, the Ffowcs Williams-Hawkings (FW-H) integral equation is solved to predict aerodynamic noise radiated from the impeller. The validity of the numerical methods is confirmed by comparing the predicted sound pressure spectrum with the measured one. Further analysis of the predicted flow field shows that the strong vortex is formed between the impeller blades. As the vortex induces the loss of the flow field and acts as an aerodynamic noise source, supplementary splitter blades are designed to the existing impeller to suppress the identified vortex. The length and position of splitter are selected as design factors and the effect of each design factor on aerodynamic noise is numerically analyzed by using the Taguchi method. From this results, the optimum location and length of splitter for minimum radiated noise is determined. The finally selected design shows lower noise than the existing one.","PeriodicalId":42689,"journal":{"name":"Journal of the Acoustical Society of Korea","volume":"39 1","pages":"524-532"},"PeriodicalIF":0.4,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44622050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.7776/ASK.2020.39.6.533
Young Geul Yoon, Kang-Hoon Choi, Dong-Gyun Han, Hawsun Sohn, J. Choi
A total length of sperm whales can be estimated by measuring the Inter-Pulse Interval(IPI) of their clicks composed by multiple pulses. The IPI is caused by the two-way travel time of the sound transmission in the spermaceti within the whale head. Therefore, the IPI can be used to measure the whale’s total length based on allometric relationships between head and body length. In this paper, the click signals recorded in the East Sea, Korea in 2017 were analyzed to estimate the size of sperm whales. The size of sperm whales calculated by the relationship between IPI and body length was 9.9 m to 10.9 m, which is corresponding to the size of an adult female or a juvenile male sperm whale. This non-lethal acoustic method has been demonstrated to accurately estimate the sperm whale size, and can provide useful information for domestic sperm whale monitoring.
{"title":"Size estimation of Sperm Whale in the East Sea of Korea using click signals","authors":"Young Geul Yoon, Kang-Hoon Choi, Dong-Gyun Han, Hawsun Sohn, J. Choi","doi":"10.7776/ASK.2020.39.6.533","DOIUrl":"https://doi.org/10.7776/ASK.2020.39.6.533","url":null,"abstract":"A total length of sperm whales can be estimated by measuring the Inter-Pulse Interval(IPI) of their clicks composed by multiple pulses. The IPI is caused by the two-way travel time of the sound transmission in the spermaceti within the whale head. Therefore, the IPI can be used to measure the whale’s total length based on allometric relationships between head and body length. In this paper, the click signals recorded in the East Sea, Korea in 2017 were analyzed to estimate the size of sperm whales. The size of sperm whales calculated by the relationship between IPI and body length was 9.9 m to 10.9 m, which is corresponding to the size of an adult female or a juvenile male sperm whale. This non-lethal acoustic method has been demonstrated to accurately estimate the sperm whale size, and can provide useful information for domestic sperm whale monitoring.","PeriodicalId":42689,"journal":{"name":"Journal of the Acoustical Society of Korea","volume":"39 1","pages":"533-540"},"PeriodicalIF":0.4,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42720346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.7776/ASK.2020.39.6.600
Gi Yong Lee and Hyoung-Gook Kim
: In this paper, we propose a sound event detection method using a multi-channel multi-scale neural networks for sound sensing home monitoring for the hearing impaired. In the proposed system, two channels with high signal quality are selected from several wireless microphone sensors in home. The three features (time difference of arrival, pitch range, and outputs obtained by applying multi-scale convolutional neural network to log mel spectrogram) extracted from the sensor signals are applied to a classifier based on a bidirectional gated recurrent neural network to further improve the performance of sound event detection. The detected sound event result is converted into text along with the sensor position of the selected channel and provided to the hearing impaired. The experimental results show that the sound event detection method of the proposed system is superior to the existing method and can effectively deliver sound information to the hearing impaired
{"title":"Sound event detection based on multi-channel multi-scale neural networks for home monitoring system used by the hard-of-hearing","authors":"Gi Yong Lee and Hyoung-Gook Kim","doi":"10.7776/ASK.2020.39.6.600","DOIUrl":"https://doi.org/10.7776/ASK.2020.39.6.600","url":null,"abstract":": In this paper, we propose a sound event detection method using a multi-channel multi-scale neural networks for sound sensing home monitoring for the hearing impaired. In the proposed system, two channels with high signal quality are selected from several wireless microphone sensors in home. The three features (time difference of arrival, pitch range, and outputs obtained by applying multi-scale convolutional neural network to log mel spectrogram) extracted from the sensor signals are applied to a classifier based on a bidirectional gated recurrent neural network to further improve the performance of sound event detection. The detected sound event result is converted into text along with the sensor position of the selected channel and provided to the hearing impaired. The experimental results show that the sound event detection method of the proposed system is superior to the existing method and can effectively deliver sound information to the hearing impaired","PeriodicalId":42689,"journal":{"name":"Journal of the Acoustical Society of Korea","volume":"39 1","pages":"600-605"},"PeriodicalIF":0.4,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43125984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.7776/ASK.2020.39.6.615
Gwantae Kim, Bonhwa Ku, Hanseok Ko
In this paper, we propose a multi-site based earthquake event classification method using graph convolution networks. In the traditional earthquake event classification methods using deep learning, they used single-site observation to estimate seismic event class. However, to achieve robust and accurate earthquake event classification on the seismic observation network, the method using the information from the multi-site observations is needed, instead of using only single-site data. Firstly, our proposed model employs convolution neural networks to extract informative embedding features from the single-site observation. Secondly, graph convolution networks are used to integrate the features from several stations. To evaluate our model, we explore the model structure and the number of stations for ablation study. Finally, our multi-site based model outperforms up to 10 % accuracy and event recall rate compared to single-site based model.
{"title":"Multi-site based earthquake event classification using graph convolution networks","authors":"Gwantae Kim, Bonhwa Ku, Hanseok Ko","doi":"10.7776/ASK.2020.39.6.615","DOIUrl":"https://doi.org/10.7776/ASK.2020.39.6.615","url":null,"abstract":"In this paper, we propose a multi-site based earthquake event classification method using graph convolution networks. In the traditional earthquake event classification methods using deep learning, they used single-site observation to estimate seismic event class. However, to achieve robust and accurate earthquake event classification on the seismic observation network, the method using the information from the multi-site observations is needed, instead of using only single-site data. Firstly, our proposed model employs convolution neural networks to extract informative embedding features from the single-site observation. Secondly, graph convolution networks are used to integrate the features from several stations. To evaluate our model, we explore the model structure and the number of stations for ablation study. Finally, our multi-site based model outperforms up to 10 % accuracy and event recall rate compared to single-site based model.","PeriodicalId":42689,"journal":{"name":"Journal of the Acoustical Society of Korea","volume":"39 1","pages":"615-621"},"PeriodicalIF":0.4,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43939842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.7776/ASK.2020.39.6.559
Danbi Lee
: This paper proposes a noise cancelling algorithm to remove own-ship noise for a towed array sonar. Extra beamforming is performed using partial channels of the acoustic array to get a reference beam signal robust to the noise bearing. Frequency domain Adaptive Noise Cancelling (ANC) is applied based on Normalized Least Mean Square (NLMS) algorithm using the reference beam. The bearing of own-ship noise is estimated from the coherence between the reference beam and input beam signals. Own-ship noise level is calculated using a beampattern of the noise with estimated steering angle, which prevents loss of a target signal by determining whether to update a filter so that removed signal level does not exceed the estimated noise level. Simulation results show the proposed algorithm maintains its performance when the own-ship gets out off its bearing 40 % more than the conventional algorithm’s limit and detects the target even when the frequency of the target signal is same with the frequency of the own-ship signal.
{"title":"Own-ship noise cancelling method for towed line array sonars using a beam-formed reference signal","authors":"Danbi Lee","doi":"10.7776/ASK.2020.39.6.559","DOIUrl":"https://doi.org/10.7776/ASK.2020.39.6.559","url":null,"abstract":": This paper proposes a noise cancelling algorithm to remove own-ship noise for a towed array sonar. Extra beamforming is performed using partial channels of the acoustic array to get a reference beam signal robust to the noise bearing. Frequency domain Adaptive Noise Cancelling (ANC) is applied based on Normalized Least Mean Square (NLMS) algorithm using the reference beam. The bearing of own-ship noise is estimated from the coherence between the reference beam and input beam signals. Own-ship noise level is calculated using a beampattern of the noise with estimated steering angle, which prevents loss of a target signal by determining whether to update a filter so that removed signal level does not exceed the estimated noise level. Simulation results show the proposed algorithm maintains its performance when the own-ship gets out off its bearing 40 % more than the conventional algorithm’s limit and detects the target even when the frequency of the target signal is same with the frequency of the own-ship signal.","PeriodicalId":42689,"journal":{"name":"Journal of the Acoustical Society of Korea","volume":"39 1","pages":"559-567"},"PeriodicalIF":0.4,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41732920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.7776/ASK.2020.39.5.414
C. Park, DongHyun Kim, Hanseok Ko
In this paper, we propose a Dilated Convolution Gate Linear Unit (DCGLU) to mitigate the lack of sparsity and small receptive field problems caused by the segmentation map extraction process in sound event detection with weak labels. In the advent of deep learning framework, segmentation map extraction approaches have shown improved performance in noisy environments. However, these methods are forced to maintain the size of the feature map to extract the segmentation map as the model would be constructed without a pooling operation. As a result, the performance of these methods is deteriorated with a lack of sparsity and a small receptive field. To mitigate these problems, we utilize GLU to control the flow of information and Dilated Convolutional Neural Networks (DCNNs) to increase the receptive field without additional learning parameters. For the performance evaluation, we employ a URBAN-SED and self-organized bird sound dataset. The relevant experiments show that our proposed DCGLU model outperforms over other baselines. In particular, our method is shown to exhibit robustness against nature sound noises with three Signal to Noise Ratio (SNR) levels (20 dB, 10 dB and 0 dB).
{"title":"Dilated convolution and gated linear unit based sound event detection and tagging algorithm using weak label","authors":"C. Park, DongHyun Kim, Hanseok Ko","doi":"10.7776/ASK.2020.39.5.414","DOIUrl":"https://doi.org/10.7776/ASK.2020.39.5.414","url":null,"abstract":"In this paper, we propose a Dilated Convolution Gate Linear Unit (DCGLU) to mitigate the lack of sparsity and small receptive field problems caused by the segmentation map extraction process in sound event detection with weak labels. In the advent of deep learning framework, segmentation map extraction approaches have shown improved performance in noisy environments. However, these methods are forced to maintain the size of the feature map to extract the segmentation map as the model would be constructed without a pooling operation. As a result, the performance of these methods is deteriorated with a lack of sparsity and a small receptive field. To mitigate these problems, we utilize GLU to control the flow of information and Dilated Convolutional Neural Networks (DCNNs) to increase the receptive field without additional learning parameters. For the performance evaluation, we employ a URBAN-SED and self-organized bird sound dataset. The relevant experiments show that our proposed DCGLU model outperforms over other baselines. In particular, our method is shown to exhibit robustness against nature sound noises with three Signal to Noise Ratio (SNR) levels (20 dB, 10 dB and 0 dB).","PeriodicalId":42689,"journal":{"name":"Journal of the Acoustical Society of Korea","volume":"39 1","pages":"414-423"},"PeriodicalIF":0.4,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45419423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.7776/ASK.2020.39.5.447
Yoohwan Kwon, Soo-Whan Chung, Hong-Goo Kang
In this paper, we propose a system to extract effective speaker representations from a speech signal using a deep learning method. Based on the fact that speech signal contains identity unrelated information such as text content, emotion, background noise, and so on, we perform a training such that the extracted features only represent speaker-related information but do not represent speaker-unrelated information. Specifically, we propose an auto-encoder based disentanglement method that outputs both speaker-related and speaker-unrelated embeddings using effective loss functions. To further improve the reconstruction performance in the decoding process, we also introduce a discriminator popularly used in Generative Adversarial Network (GAN) structure. Since improving the decoding capability is helpful for preserving speaker information and disentanglement, it results in the improvement of speaker verification performance. Experimental results demonstrate the effectiveness of our proposed method by improving Equal Error Rate (EER) on benchmark dataset, Voxceleb1.
{"title":"A study on speech disentanglement framework based on adversarial learning for speaker recognition","authors":"Yoohwan Kwon, Soo-Whan Chung, Hong-Goo Kang","doi":"10.7776/ASK.2020.39.5.447","DOIUrl":"https://doi.org/10.7776/ASK.2020.39.5.447","url":null,"abstract":"In this paper, we propose a system to extract effective speaker representations from a speech signal using a deep learning method. Based on the fact that speech signal contains identity unrelated information such as text content, emotion, background noise, and so on, we perform a training such that the extracted features only represent speaker-related information but do not represent speaker-unrelated information. Specifically, we propose an auto-encoder based disentanglement method that outputs both speaker-related and speaker-unrelated embeddings using effective loss functions. To further improve the reconstruction performance in the decoding process, we also introduce a discriminator popularly used in Generative Adversarial Network (GAN) structure. Since improving the decoding capability is helpful for preserving speaker information and disentanglement, it results in the improvement of speaker verification performance. Experimental results demonstrate the effectiveness of our proposed method by improving Equal Error Rate (EER) on benchmark dataset, Voxceleb1.","PeriodicalId":42689,"journal":{"name":"Journal of the Acoustical Society of Korea","volume":"39 1","pages":"447-453"},"PeriodicalIF":0.4,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49488110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.7776/ASK.2020.39.5.390
Inyong Jeong, Soohong Min, D. Paeng
Underwater ambient noise was measured at the eastern and western costal sites of Jeju Island where the water depth was 20 m by a hydrophone moored at mid-depth (10 m) for 4 months. These eastern and western sites were selected as potential sites for offshore wind power generator and the current wave energy generator, respectively. Ambient noise was affected by environmental data such as wind and wave, which were collected from nearby weather stations and an observation station. Below 100 Hz, ambient noise was changed about 5 dB ~ 20 dB due to low and high tide. Below 1 kHz, wave and wind effects were the main source for ambient noise, varying up to 25 dB. Ambient noise was strongly influenced by wave at lower frequency and by wind at higher frequency up to over 1 kHz. The higher frequency range over 10 kHz was influenced by rainfall and biological sources, and the spectrum was measured about 10 dB higher than the peak spectrum level from Wenz curve at this frequency range.
{"title":"Moored measurement of the ambient noise and analysis with environmental factors in the coastal sea of Jeju Island","authors":"Inyong Jeong, Soohong Min, D. Paeng","doi":"10.7776/ASK.2020.39.5.390","DOIUrl":"https://doi.org/10.7776/ASK.2020.39.5.390","url":null,"abstract":"Underwater ambient noise was measured at the eastern and western costal sites of Jeju Island where the water depth was 20 m by a hydrophone moored at mid-depth (10 m) for 4 months. These eastern and western sites were selected as potential sites for offshore wind power generator and the current wave energy generator, respectively. Ambient noise was affected by environmental data such as wind and wave, which were collected from nearby weather stations and an observation station. Below 100 Hz, ambient noise was changed about 5 dB ~ 20 dB due to low and high tide. Below 1 kHz, wave and wind effects were the main source for ambient noise, varying up to 25 dB. Ambient noise was strongly influenced by wave at lower frequency and by wind at higher frequency up to over 1 kHz. The higher frequency range over 10 kHz was influenced by rainfall and biological sources, and the spectrum was measured about 10 dB higher than the peak spectrum level from Wenz curve at this frequency range.","PeriodicalId":42689,"journal":{"name":"Journal of the Acoustical Society of Korea","volume":"39 1","pages":"390-399"},"PeriodicalIF":0.4,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43998384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.7776/ASK.2020.39.5.406
Jungmin Kim, Young Lo Lee, Donghyeon Kim, Hanseok Ko
In this paper, to improve the classification accuracy of bird and amphibian acoustic sound, we utilize GLU (Gated Linear Unit) and Self-attention that encourages the network to extract important features from data and discriminate relevant important frames from all the input sequences for further performance improvement. To utilize acoustic data, we convert 1-D acoustic data to a log-Mel spectrogram. Subsequently, undesirable component such as background noise in the log-Mel spectrogram is reduced by GLU. Then, we employ the proposed temporal self-attention to improve classification accuracy. The data consist of 6-species of birds, 8-species of amphibians including endangered species in the natural environment. As a result, our proposed method is shown to achieve an accuracy of 91 % with bird data and 93 % with amphibian data. Overall, an improvement of about 6 % ~ 7 % accuracy in performance is achieved compared to the existing algorithms.
{"title":"Temporal attention based animal sound classification","authors":"Jungmin Kim, Young Lo Lee, Donghyeon Kim, Hanseok Ko","doi":"10.7776/ASK.2020.39.5.406","DOIUrl":"https://doi.org/10.7776/ASK.2020.39.5.406","url":null,"abstract":"In this paper, to improve the classification accuracy of bird and amphibian acoustic sound, we utilize GLU (Gated Linear Unit) and Self-attention that encourages the network to extract important features from data and discriminate relevant important frames from all the input sequences for further performance improvement. To utilize acoustic data, we convert 1-D acoustic data to a log-Mel spectrogram. Subsequently, undesirable component such as background noise in the log-Mel spectrogram is reduced by GLU. Then, we employ the proposed temporal self-attention to improve classification accuracy. The data consist of 6-species of birds, 8-species of amphibians including endangered species in the natural environment. As a result, our proposed method is shown to achieve an accuracy of 91 % with bird data and 93 % with amphibian data. Overall, an improvement of about 6 % ~ 7 % accuracy in performance is achieved compared to the existing algorithms.","PeriodicalId":42689,"journal":{"name":"Journal of the Acoustical Society of Korea","volume":"39 1","pages":"406-413"},"PeriodicalIF":0.4,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71370709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}