{"title":"Sound event detection using non-negative dictionaries learned from annotated overlapping events","authors":"O. Dikmen, A. Mesaros","doi":"10.1109/WASPAA.2013.6701861","DOIUrl":null,"url":null,"abstract":"Detection of overlapping sound events generally requires training class models either from separate data for each class or by making assumptions about the dominating events in the mixed signals. Methods based on sound source separation are currently used in this task, but involve the problem of assigning separated components to sources. In this paper, we propose a method which bypasses the need to build separate sound models. Instead, non-negative dictionaries for the sound content and their annotations are learned in a coupled sense. In the testing stage, time activations of the sound dictionary columns are estimated and used to reconstruct annotations using the annotation dictionary. The method requires no separate training data for classes and in general very promising results are obtained using only a small amount of data.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"350 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"53","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WASPAA.2013.6701861","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 53
Abstract
Detection of overlapping sound events generally requires training class models either from separate data for each class or by making assumptions about the dominating events in the mixed signals. Methods based on sound source separation are currently used in this task, but involve the problem of assigning separated components to sources. In this paper, we propose a method which bypasses the need to build separate sound models. Instead, non-negative dictionaries for the sound content and their annotations are learned in a coupled sense. In the testing stage, time activations of the sound dictionary columns are estimated and used to reconstruct annotations using the annotation dictionary. The method requires no separate training data for classes and in general very promising results are obtained using only a small amount of data.