{"title":"有效识别环境声音的音频特征与分类器分析","authors":"C. Okuyucu, M. Sert, A. Yazıcı","doi":"10.1109/ISM.2013.29","DOIUrl":null,"url":null,"abstract":"Environmental sounds (ES) have different characteristics, such as unstructured nature and typically noise-like and flat spectrums, which make recognition task difficult compared to speech or music sounds. Here, we perform an exhaustive feature and classifier analysis for the recognition of considerably similar ES categories and propose a best representative feature to yield higher recognition accuracy. In the experiments, thirteen (13) ES categories, namely emergency alarm, car horn, gun, explosion, automobile, helicopter, water, wind, rain, applause, crowd, and laughter are detected and tested based on eleven (11) audio features (MPEG-7 family, ZCR, MFCC, and combinations) by using the HMM and SVM classifiers. Extensive experiments have been conducted to demonstrate the effectiveness of these joint features for ES classification. Our experiments show that, the joint feature set ASFCS-H (Audio Spectrum Flatness, Centroid, Spread, and Audio Harmonicity) is the best representative feature set with an average F-measure value of 80.6%.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"50 1","pages":"125-132"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Audio Feature and Classifier Analysis for Efficient Recognition of Environmental Sounds\",\"authors\":\"C. Okuyucu, M. Sert, A. Yazıcı\",\"doi\":\"10.1109/ISM.2013.29\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Environmental sounds (ES) have different characteristics, such as unstructured nature and typically noise-like and flat spectrums, which make recognition task difficult compared to speech or music sounds. Here, we perform an exhaustive feature and classifier analysis for the recognition of considerably similar ES categories and propose a best representative feature to yield higher recognition accuracy. In the experiments, thirteen (13) ES categories, namely emergency alarm, car horn, gun, explosion, automobile, helicopter, water, wind, rain, applause, crowd, and laughter are detected and tested based on eleven (11) audio features (MPEG-7 family, ZCR, MFCC, and combinations) by using the HMM and SVM classifiers. Extensive experiments have been conducted to demonstrate the effectiveness of these joint features for ES classification. Our experiments show that, the joint feature set ASFCS-H (Audio Spectrum Flatness, Centroid, Spread, and Audio Harmonicity) is the best representative feature set with an average F-measure value of 80.6%.\",\"PeriodicalId\":6311,\"journal\":{\"name\":\"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)\",\"volume\":\"50 1\",\"pages\":\"125-132\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISM.2013.29\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISM.2013.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
摘要
环境声音(ES)具有不同的特征,例如非结构化的性质和典型的噪声和平坦的频谱,与语音或音乐声音相比,这使得识别任务变得困难。在这里,我们对相当相似的ES类别的识别进行了详尽的特征和分类器分析,并提出了一个最佳代表性特征,以产生更高的识别精度。在实验中,基于11个音频特征(MPEG-7族、ZCR、MFCC和组合),使用HMM和SVM分类器对紧急报警、汽车喇叭、枪、爆炸、汽车、直升机、水、风、雨、掌声、人群、笑声等13个ES类别进行检测和测试。已经进行了大量的实验来证明这些联合特征对ES分类的有效性。实验表明,联合特征集ASFCS-H (Audio Spectrum Flatness, Centroid, Spread, and Audio Harmonicity)是最具代表性的特征集,平均f测量值为80.6%。
Audio Feature and Classifier Analysis for Efficient Recognition of Environmental Sounds
Environmental sounds (ES) have different characteristics, such as unstructured nature and typically noise-like and flat spectrums, which make recognition task difficult compared to speech or music sounds. Here, we perform an exhaustive feature and classifier analysis for the recognition of considerably similar ES categories and propose a best representative feature to yield higher recognition accuracy. In the experiments, thirteen (13) ES categories, namely emergency alarm, car horn, gun, explosion, automobile, helicopter, water, wind, rain, applause, crowd, and laughter are detected and tested based on eleven (11) audio features (MPEG-7 family, ZCR, MFCC, and combinations) by using the HMM and SVM classifiers. Extensive experiments have been conducted to demonstrate the effectiveness of these joint features for ES classification. Our experiments show that, the joint feature set ASFCS-H (Audio Spectrum Flatness, Centroid, Spread, and Audio Harmonicity) is the best representative feature set with an average F-measure value of 80.6%.