{"title":"Sound event detection using class activation maps","authors":"Jakub Bajzik, R. Jarina","doi":"10.1109/ELEKTRO53996.2022.9803350","DOIUrl":null,"url":null,"abstract":"In this paper, we present the system for sound event detection in domestic environments as defined in the DCASE 2021 challenge Task 4. The task aims to provide audio event localization timestamps in addition to event class probabilities. We aim to explore the usage of class activation maps, known from image processing, in such sound event detection systems. We propose two systems. The first system is a convolutional neural network trained for sound event classification using only weakly labeled and unlabeled data. The strong labels are obtained using class activation mapping, which is a popular technique, especially in image processing. In the second proposed system, we modified the baseline system, provided by the DCASE organizers, in which we added the class activation mapping as a part of the attention mechanism. The experimental results show that the class activation maps enable improvement of the system performance in comparison with the baseline.","PeriodicalId":396752,"journal":{"name":"2022 ELEKTRO (ELEKTRO)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 ELEKTRO (ELEKTRO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ELEKTRO53996.2022.9803350","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we present the system for sound event detection in domestic environments as defined in the DCASE 2021 challenge Task 4. The task aims to provide audio event localization timestamps in addition to event class probabilities. We aim to explore the usage of class activation maps, known from image processing, in such sound event detection systems. We propose two systems. The first system is a convolutional neural network trained for sound event classification using only weakly labeled and unlabeled data. The strong labels are obtained using class activation mapping, which is a popular technique, especially in image processing. In the second proposed system, we modified the baseline system, provided by the DCASE organizers, in which we added the class activation mapping as a part of the attention mechanism. The experimental results show that the class activation maps enable improvement of the system performance in comparison with the baseline.