语音、单声歌唱和复声音乐鲁棒识别的特征选择与叠加

2005 IEEE International Conference on Multimedia and Expo Pub Date : 2005-07-06 DOI:10.1109/ICME.2005.1521554

Björn Schuller, Brüning J. B. Schmitt, D. Arsic, S. Reiter, M. Lang, G. Rigoll

{"title":"语音、单声歌唱和复声音乐鲁棒识别的特征选择与叠加","authors":"Björn Schuller, Brüning J. B. Schmitt, D. Arsic, S. Reiter, M. Lang, G. Rigoll","doi":"10.1109/ICME.2005.1521554","DOIUrl":null,"url":null,"abstract":"In this work we strive to find an optimal set of acoustic features for the discrimination of speech, monophonic singing, and polyphonic music to robustly segment acoustic media streams for annotation and interaction purposes. Furthermore we introduce ensemble-based classification approaches within this task. From a basis of 276 attributes we select the most efficient set by SVM-SFFS. Additionally relevance of single features by calculation of information gain ratio is presented. As a basis of comparison we reduce dimensionality by PCA. We show extensive analysis of different classifiers within the named task. Among these are kernel machines, decision trees, and Bayesian classifiers. Moreover we improve single classifier performance by bagging and boosting, and finally combine strengths of classifiers by stackingC. The database is formed by 2,114 samples of speech, and singing of 58 persons. 1,000 music clips have been taken from the MTV-Europe-Top-20 1980-2000. The outstanding discrimination results of a working realtime capable implementation stress the practicability of the proposed novel ideas","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Feature Selection and Stacking for Robust Discrimination of Speech, Monophonic Singing, and Polyphonic Music\",\"authors\":\"Björn Schuller, Brüning J. B. Schmitt, D. Arsic, S. Reiter, M. Lang, G. Rigoll\",\"doi\":\"10.1109/ICME.2005.1521554\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work we strive to find an optimal set of acoustic features for the discrimination of speech, monophonic singing, and polyphonic music to robustly segment acoustic media streams for annotation and interaction purposes. Furthermore we introduce ensemble-based classification approaches within this task. From a basis of 276 attributes we select the most efficient set by SVM-SFFS. Additionally relevance of single features by calculation of information gain ratio is presented. As a basis of comparison we reduce dimensionality by PCA. We show extensive analysis of different classifiers within the named task. Among these are kernel machines, decision trees, and Bayesian classifiers. Moreover we improve single classifier performance by bagging and boosting, and finally combine strengths of classifiers by stackingC. The database is formed by 2,114 samples of speech, and singing of 58 persons. 1,000 music clips have been taken from the MTV-Europe-Top-20 1980-2000. The outstanding discrimination results of a working realtime capable implementation stress the practicability of the proposed novel ideas\",\"PeriodicalId\":244360,\"journal\":{\"name\":\"2005 IEEE International Conference on Multimedia and Expo\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-07-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2005 IEEE International Conference on Multimedia and Expo\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME.2005.1521554\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE International Conference on Multimedia and Expo","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2005.1521554","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

摘要

在这项工作中，我们努力寻找一组最佳的声学特征，用于识别语音、单声歌唱和复声音乐，以稳健地分割声学媒体流，用于注释和交互目的。此外，我们在本任务中引入了基于集成的分类方法。我们从276个属性中选择SVM-SFFS最有效的集合。此外，通过计算信息增益比，提出了单个特征的相关性。作为比较的基础，我们用主成分分析法降维。我们展示了对命名任务中不同分类器的广泛分析。其中包括核机器、决策树和贝叶斯分类器。此外，我们通过装袋和提升来提高单个分类器的性能，最后通过堆叠来结合分类器的优势。该数据库由2,114个语音样本和58个人的歌声组成。从1980-2000年MTV-Europe-Top-20中截取了1000个音乐片段。一个工作的实时能力实现的突出的识别结果强调了所提出的新思想的实用性

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Feature Selection and Stacking for Robust Discrimination of Speech, Monophonic Singing, and Polyphonic Music

In this work we strive to find an optimal set of acoustic features for the discrimination of speech, monophonic singing, and polyphonic music to robustly segment acoustic media streams for annotation and interaction purposes. Furthermore we introduce ensemble-based classification approaches within this task. From a basis of 276 attributes we select the most efficient set by SVM-SFFS. Additionally relevance of single features by calculation of information gain ratio is presented. As a basis of comparison we reduce dimensionality by PCA. We show extensive analysis of different classifiers within the named task. Among these are kernel machines, decision trees, and Bayesian classifiers. Moreover we improve single classifier performance by bagging and boosting, and finally combine strengths of classifiers by stackingC. The database is formed by 2,114 samples of speech, and singing of 58 persons. 1,000 music clips have been taken from the MTV-Europe-Top-20 1980-2000. The outstanding discrimination results of a working realtime capable implementation stress the practicability of the proposed novel ideas

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2005 IEEE International Conference on Multimedia and Expo

自引率

0.00%

发文量

期刊最新文献

Lossless image compression with tree coding of magnitude levels Maximizing the profit for cache replacement in a transcoding proxy Pre-Attentional Filtering in Compressed Video Annotation and detection of blended emotions in real human-human dialogs recorded in a call center Fast inter frame encoding based on modes pre-decision in H.264