Musical Instrument Recognition in Polyphonic Audio Using Missing Feature Approach

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-09-01 DOI:10.1109/TASL.2013.2248720

D. Giannoulis, Anssi Klapuri

{"title":"Musical Instrument Recognition in Polyphonic Audio Using Missing Feature Approach","authors":"D. Giannoulis, Anssi Klapuri","doi":"10.1109/TASL.2013.2248720","DOIUrl":null,"url":null,"abstract":"A method is described for musical instrument recognition in polyphonic audio signals where several sound sources are active at the same time. The proposed method is based on local spectral features and missing-feature techniques. A novel mask estimation algorithm is described that identifies spectral regions that contain reliable information for each sound source, and bounded marginalization is then used to treat the feature vector elements that are determined to be unreliable. The mask estimation technique is based on the assumption that the spectral envelopes of musical sounds tend to be slowly-varying as a function of log-frequency and unreliable spectral components can therefore be detected as positive deviations from an estimated smooth spectral envelope. A computationally efficient algorithm is proposed for marginalizing the mask in the classification process. In simulations, the proposed method clearly outperforms reference methods for mixture signals. The proposed mask estimation technique leads to a recognition accuracy that is approximately half-way between a trivial all-one mask (all features are assumed reliable) and an ideal “oracle” mask.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1805-1817"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2248720","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Audio Speech and Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASL.2013.2248720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

Abstract

A method is described for musical instrument recognition in polyphonic audio signals where several sound sources are active at the same time. The proposed method is based on local spectral features and missing-feature techniques. A novel mask estimation algorithm is described that identifies spectral regions that contain reliable information for each sound source, and bounded marginalization is then used to treat the feature vector elements that are determined to be unreliable. The mask estimation technique is based on the assumption that the spectral envelopes of musical sounds tend to be slowly-varying as a function of log-frequency and unreliable spectral components can therefore be detected as positive deviations from an estimated smooth spectral envelope. A computationally efficient algorithm is proposed for marginalizing the mask in the classification process. In simulations, the proposed method clearly outperforms reference methods for mixture signals. The proposed mask estimation technique leads to a recognition accuracy that is approximately half-way between a trivial all-one mask (all features are assumed reliable) and an ideal “oracle” mask.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于缺失特征方法的复调音频乐器识别

描述了一种在多个声源同时处于活动状态的复调音频信号中进行乐器识别的方法。该方法基于局部光谱特征和缺失特征技术。描述了一种新的掩模估计算法，该算法可以识别每个声源包含可靠信息的频谱区域，然后使用有界边缘化来处理被确定为不可靠的特征向量元素。掩模估计技术是基于这样的假设，即音乐声音的频谱包络作为对数频率的函数往往是缓慢变化的，因此可以检测到不可靠的频谱成分与估计的平滑频谱包络的正偏差。提出了一种计算效率高的分类过程中掩码边缘化算法。仿真结果表明，该方法明显优于混合信号的参考方法。所提出的掩码估计技术导致的识别精度大约介于普通的全一掩码(假设所有特征都是可靠的)和理想的“oracle”掩码之间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Audio Speech and Language Processing 工程技术-工程：电子与电气

自引率

0.00%

发文量

审稿时长

24.0 months

期刊介绍： The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.