有效识别环境声音的音频特征与分类器分析

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) Pub Date : 2013-12-09 DOI:10.1109/ISM.2013.29

C. Okuyucu, M. Sert, A. Yazıcı

{"title":"有效识别环境声音的音频特征与分类器分析","authors":"C. Okuyucu, M. Sert, A. Yazıcı","doi":"10.1109/ISM.2013.29","DOIUrl":null,"url":null,"abstract":"Environmental sounds (ES) have different characteristics, such as unstructured nature and typically noise-like and flat spectrums, which make recognition task difficult compared to speech or music sounds. Here, we perform an exhaustive feature and classifier analysis for the recognition of considerably similar ES categories and propose a best representative feature to yield higher recognition accuracy. In the experiments, thirteen (13) ES categories, namely emergency alarm, car horn, gun, explosion, automobile, helicopter, water, wind, rain, applause, crowd, and laughter are detected and tested based on eleven (11) audio features (MPEG-7 family, ZCR, MFCC, and combinations) by using the HMM and SVM classifiers. Extensive experiments have been conducted to demonstrate the effectiveness of these joint features for ES classification. Our experiments show that, the joint feature set ASFCS-H (Audio Spectrum Flatness, Centroid, Spread, and Audio Harmonicity) is the best representative feature set with an average F-measure value of 80.6%.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"50 1","pages":"125-132"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Audio Feature and Classifier Analysis for Efficient Recognition of Environmental Sounds\",\"authors\":\"C. Okuyucu, M. Sert, A. Yazıcı\",\"doi\":\"10.1109/ISM.2013.29\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Environmental sounds (ES) have different characteristics, such as unstructured nature and typically noise-like and flat spectrums, which make recognition task difficult compared to speech or music sounds. Here, we perform an exhaustive feature and classifier analysis for the recognition of considerably similar ES categories and propose a best representative feature to yield higher recognition accuracy. In the experiments, thirteen (13) ES categories, namely emergency alarm, car horn, gun, explosion, automobile, helicopter, water, wind, rain, applause, crowd, and laughter are detected and tested based on eleven (11) audio features (MPEG-7 family, ZCR, MFCC, and combinations) by using the HMM and SVM classifiers. Extensive experiments have been conducted to demonstrate the effectiveness of these joint features for ES classification. Our experiments show that, the joint feature set ASFCS-H (Audio Spectrum Flatness, Centroid, Spread, and Audio Harmonicity) is the best representative feature set with an average F-measure value of 80.6%.\",\"PeriodicalId\":6311,\"journal\":{\"name\":\"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)\",\"volume\":\"50 1\",\"pages\":\"125-132\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISM.2013.29\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISM.2013.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

环境声音(ES)具有不同的特征，例如非结构化的性质和典型的噪声和平坦的频谱，与语音或音乐声音相比，这使得识别任务变得困难。在这里，我们对相当相似的ES类别的识别进行了详尽的特征和分类器分析，并提出了一个最佳代表性特征，以产生更高的识别精度。在实验中，基于11个音频特征(MPEG-7族、ZCR、MFCC和组合)，使用HMM和SVM分类器对紧急报警、汽车喇叭、枪、爆炸、汽车、直升机、水、风、雨、掌声、人群、笑声等13个ES类别进行检测和测试。已经进行了大量的实验来证明这些联合特征对ES分类的有效性。实验表明，联合特征集ASFCS-H (Audio Spectrum Flatness, Centroid, Spread, and Audio Harmonicity)是最具代表性的特征集，平均f测量值为80.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Audio Feature and Classifier Analysis for Efficient Recognition of Environmental Sounds

Environmental sounds (ES) have different characteristics, such as unstructured nature and typically noise-like and flat spectrums, which make recognition task difficult compared to speech or music sounds. Here, we perform an exhaustive feature and classifier analysis for the recognition of considerably similar ES categories and propose a best representative feature to yield higher recognition accuracy. In the experiments, thirteen (13) ES categories, namely emergency alarm, car horn, gun, explosion, automobile, helicopter, water, wind, rain, applause, crowd, and laughter are detected and tested based on eleven (11) audio features (MPEG-7 family, ZCR, MFCC, and combinations) by using the HMM and SVM classifiers. Extensive experiments have been conducted to demonstrate the effectiveness of these joint features for ES classification. Our experiments show that, the joint feature set ASFCS-H (Audio Spectrum Flatness, Centroid, Spread, and Audio Harmonicity) is the best representative feature set with an average F-measure value of 80.6%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

自引率

0.00%

发文量

期刊最新文献

The LectureSight System in Production Scenarios and Its Impact on Learning from Video Recorded Lectures Similarity-Based Browsing of Image Search Results Efficient Super Resolution Using Edge Directed Unsharp Masking Sharpening Method A Fluorescent Mid-air Screen Towards Sketch-Based Motion Queries in Sports Videos