Dharmesh M. Agrawal, Hardik B. Sailor, Meet H. Soni, H. Patil
{"title":"基于teo的新型γ matone特征环境声音分类","authors":"Dharmesh M. Agrawal, Hardik B. Sailor, Meet H. Soni, H. Patil","doi":"10.23919/EUSIPCO.2017.8081521","DOIUrl":null,"url":null,"abstract":"In this paper, we propose to use modified Gammatone filterbank with Teager Energy Operator (TEO) for environmental sound classification (ESC) task. TEO can track energy as a function of both amplitude and frequency of an audio signal. TEO is better for capturing energy variations in the signal that is produced by a real physical system, such as, environmental sounds that contain amplitude and frequency modulations. In proposed feature set, we have used Gammatone filterbank since it represents characteristics of human auditory processing. Here, we have used two classifiers, namely, Gaussian Mixture Model (GMM) using cepstral features, and Convolutional Neural Network (CNN) using spectral features. We performed experiments on two datasets, namely, ESC-50, and UrbanSound8K. We compared TEO-based coefficients with Mel filter cepstral coefficients (MFCC) and Gammatone cepstral coefficients (GTCC), in which GTCC used mean square energy. Using GMM, the proposed TEO-based Gammatone Cepstral Coefficients (TEO-GTCC), and its score-level fusion with MFCC gave absolute improvement of 0.45 %, and 3.85 % in classification accuracy over MFCC on ESC-50 dataset. Similarly, on UrbanSound8K dataset the proposed TEO-GTCC, and its score-level fusion with GTCC gave absolute improvement of 1.40 %, and 2.44 % in classification accuracy over MFCC. Using CNN, the score-level fusion of Gammatone spectral coefficient (GTSC) and the proposed TEO-based Gammatone spectral coefficients (TEO-GTSC) gave absolute improvement of 14.10 %, and 14.52 % in classification accuracy over Mel filterbank energies (FBE) on ESC-50 and UrbanSond8K datasets, respectively. This shows that proposed TEO-based Gammatone features contain complementary information which is helpful in ESC task.","PeriodicalId":346811,"journal":{"name":"2017 25th European Signal Processing Conference (EUSIPCO)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":"{\"title\":\"Novel TEO-based Gammatone features for environmental sound classification\",\"authors\":\"Dharmesh M. Agrawal, Hardik B. Sailor, Meet H. Soni, H. Patil\",\"doi\":\"10.23919/EUSIPCO.2017.8081521\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose to use modified Gammatone filterbank with Teager Energy Operator (TEO) for environmental sound classification (ESC) task. TEO can track energy as a function of both amplitude and frequency of an audio signal. TEO is better for capturing energy variations in the signal that is produced by a real physical system, such as, environmental sounds that contain amplitude and frequency modulations. In proposed feature set, we have used Gammatone filterbank since it represents characteristics of human auditory processing. Here, we have used two classifiers, namely, Gaussian Mixture Model (GMM) using cepstral features, and Convolutional Neural Network (CNN) using spectral features. We performed experiments on two datasets, namely, ESC-50, and UrbanSound8K. We compared TEO-based coefficients with Mel filter cepstral coefficients (MFCC) and Gammatone cepstral coefficients (GTCC), in which GTCC used mean square energy. Using GMM, the proposed TEO-based Gammatone Cepstral Coefficients (TEO-GTCC), and its score-level fusion with MFCC gave absolute improvement of 0.45 %, and 3.85 % in classification accuracy over MFCC on ESC-50 dataset. Similarly, on UrbanSound8K dataset the proposed TEO-GTCC, and its score-level fusion with GTCC gave absolute improvement of 1.40 %, and 2.44 % in classification accuracy over MFCC. Using CNN, the score-level fusion of Gammatone spectral coefficient (GTSC) and the proposed TEO-based Gammatone spectral coefficients (TEO-GTSC) gave absolute improvement of 14.10 %, and 14.52 % in classification accuracy over Mel filterbank energies (FBE) on ESC-50 and UrbanSond8K datasets, respectively. This shows that proposed TEO-based Gammatone features contain complementary information which is helpful in ESC task.\",\"PeriodicalId\":346811,\"journal\":{\"name\":\"2017 25th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"111 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"46\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 25th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/EUSIPCO.2017.8081521\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 25th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/EUSIPCO.2017.8081521","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Novel TEO-based Gammatone features for environmental sound classification
In this paper, we propose to use modified Gammatone filterbank with Teager Energy Operator (TEO) for environmental sound classification (ESC) task. TEO can track energy as a function of both amplitude and frequency of an audio signal. TEO is better for capturing energy variations in the signal that is produced by a real physical system, such as, environmental sounds that contain amplitude and frequency modulations. In proposed feature set, we have used Gammatone filterbank since it represents characteristics of human auditory processing. Here, we have used two classifiers, namely, Gaussian Mixture Model (GMM) using cepstral features, and Convolutional Neural Network (CNN) using spectral features. We performed experiments on two datasets, namely, ESC-50, and UrbanSound8K. We compared TEO-based coefficients with Mel filter cepstral coefficients (MFCC) and Gammatone cepstral coefficients (GTCC), in which GTCC used mean square energy. Using GMM, the proposed TEO-based Gammatone Cepstral Coefficients (TEO-GTCC), and its score-level fusion with MFCC gave absolute improvement of 0.45 %, and 3.85 % in classification accuracy over MFCC on ESC-50 dataset. Similarly, on UrbanSound8K dataset the proposed TEO-GTCC, and its score-level fusion with GTCC gave absolute improvement of 1.40 %, and 2.44 % in classification accuracy over MFCC. Using CNN, the score-level fusion of Gammatone spectral coefficient (GTSC) and the proposed TEO-based Gammatone spectral coefficients (TEO-GTSC) gave absolute improvement of 14.10 %, and 14.52 % in classification accuracy over Mel filterbank energies (FBE) on ESC-50 and UrbanSond8K datasets, respectively. This shows that proposed TEO-based Gammatone features contain complementary information which is helpful in ESC task.