{"title":"On the Use of Auditory Representations for Sparsity-Based Sound Source Separation","authors":"J. Burred, T. Sikora","doi":"10.1109/ICICS.2005.1689302","DOIUrl":null,"url":null,"abstract":"Sparsity-based source separation algorithms often rely on a transformation into a sparse domain to improve mixture disjointness and therefore facilitate separation. To this end, the most commonly used time-frequency representation has been the short time Fourier transform (STFT). The purpose of this paper is to study the use of auditory-based representations instead of the STFT. We first evaluate the STFT disjointness properties for the case of speech and music signals, and show that auditory representations based on the equal rectangular bandwidth (ERB) and Bark frequency scales can improve the disjointness of the transformed mixtures","PeriodicalId":425178,"journal":{"name":"2005 5th International Conference on Information Communications & Signal Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 5th International Conference on Information Communications & Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICS.2005.1689302","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
Sparsity-based source separation algorithms often rely on a transformation into a sparse domain to improve mixture disjointness and therefore facilitate separation. To this end, the most commonly used time-frequency representation has been the short time Fourier transform (STFT). The purpose of this paper is to study the use of auditory-based representations instead of the STFT. We first evaluate the STFT disjointness properties for the case of speech and music signals, and show that auditory representations based on the equal rectangular bandwidth (ERB) and Bark frequency scales can improve the disjointness of the transformed mixtures