一种改进的基于平均教师的大规模弱标记半监督声音事件检测方法

Xu Zheng, Yan Song, I. Mcloughlin, Lin Liu, Lirong Dai
{"title":"一种改进的基于平均教师的大规模弱标记半监督声音事件检测方法","authors":"Xu Zheng, Yan Song, I. Mcloughlin, Lin Liu, Lirong Dai","doi":"10.1109/ICASSP39728.2021.9414931","DOIUrl":null,"url":null,"abstract":"This paper presents an improved mean teacher (MT) based method for large-scale weakly labeled semi-supervised sound event detection (SED), by focusing on learning a better student model. Two main improvements are proposed based on the authors’ previous perturbation based MT method. Firstly, an event-aware module is de-signed to allow multiple branches with different kernel sizes to be fused via an attention mechanism. By inserting this module after the convolutional layer, each neuron can adaptively adjust its receptive field to suit different sound events. Secondly, instead of using the teacher model to provide a consistency cost term, we propose using a stochastic inference of unlabeled examples to generate high quality pseudo-targets by averaging multiple predictions from the perturbed student model. MixUp of both labeled and unlabeled data is further exploited to improve the effectiveness of student model. Finally, the teacher model can be obtained via exponential moving average (EMA) of the student model, which generates final predictions for SED during inference. Experiments on the DCASE2018 task4 dataset demonstrate the ability of the proposed method. Specifically, an F1-score of 42.1% is achieved, significantly outperforming the 32.4% achieved by the winning system, or the 39.3% by the previous perturbation based method.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"55 5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"An Improved Mean Teacher Based Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection\",\"authors\":\"Xu Zheng, Yan Song, I. Mcloughlin, Lin Liu, Lirong Dai\",\"doi\":\"10.1109/ICASSP39728.2021.9414931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents an improved mean teacher (MT) based method for large-scale weakly labeled semi-supervised sound event detection (SED), by focusing on learning a better student model. Two main improvements are proposed based on the authors’ previous perturbation based MT method. Firstly, an event-aware module is de-signed to allow multiple branches with different kernel sizes to be fused via an attention mechanism. By inserting this module after the convolutional layer, each neuron can adaptively adjust its receptive field to suit different sound events. Secondly, instead of using the teacher model to provide a consistency cost term, we propose using a stochastic inference of unlabeled examples to generate high quality pseudo-targets by averaging multiple predictions from the perturbed student model. MixUp of both labeled and unlabeled data is further exploited to improve the effectiveness of student model. Finally, the teacher model can be obtained via exponential moving average (EMA) of the student model, which generates final predictions for SED during inference. Experiments on the DCASE2018 task4 dataset demonstrate the ability of the proposed method. Specifically, an F1-score of 42.1% is achieved, significantly outperforming the 32.4% achieved by the winning system, or the 39.3% by the previous perturbation based method.\",\"PeriodicalId\":347060,\"journal\":{\"name\":\"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"55 5\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP39728.2021.9414931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP39728.2021.9414931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

本文提出了一种改进的基于平均教师(MT)的大规模弱标记半监督声音事件检测(SED)方法,重点是学习一个更好的学生模型。在前人基于微扰的MT方法的基础上,提出了两个主要改进。首先,设计了一个事件感知模块,通过关注机制将不同内核大小的多个分支融合在一起;通过在卷积层之后插入该模块,每个神经元可以自适应地调整其接受野以适应不同的声音事件。其次,我们不使用教师模型来提供一致性成本项,而是使用未标记样本的随机推理,通过对扰动学生模型的多个预测进行平均来生成高质量的伪目标。进一步利用标记和未标记数据的混淆来提高学生模型的有效性。最后,教师模型可以通过学生模型的指数移动平均(EMA)得到,该模型在推理过程中生成SED的最终预测。在DCASE2018 task4数据集上的实验验证了该方法的有效性。具体来说,f1得分达到了42.1%,大大超过了获胜系统的32.4%,或之前基于扰动的方法的39.3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An Improved Mean Teacher Based Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection
This paper presents an improved mean teacher (MT) based method for large-scale weakly labeled semi-supervised sound event detection (SED), by focusing on learning a better student model. Two main improvements are proposed based on the authors’ previous perturbation based MT method. Firstly, an event-aware module is de-signed to allow multiple branches with different kernel sizes to be fused via an attention mechanism. By inserting this module after the convolutional layer, each neuron can adaptively adjust its receptive field to suit different sound events. Secondly, instead of using the teacher model to provide a consistency cost term, we propose using a stochastic inference of unlabeled examples to generate high quality pseudo-targets by averaging multiple predictions from the perturbed student model. MixUp of both labeled and unlabeled data is further exploited to improve the effectiveness of student model. Finally, the teacher model can be obtained via exponential moving average (EMA) of the student model, which generates final predictions for SED during inference. Experiments on the DCASE2018 task4 dataset demonstrate the ability of the proposed method. Specifically, an F1-score of 42.1% is achieved, significantly outperforming the 32.4% achieved by the winning system, or the 39.3% by the previous perturbation based method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Subspace Oddity - Optimization on Product of Stiefel Manifolds for EEG Data Recognition of Dynamic Hand Gesture Based on Mm-Wave Fmcw Radar Micro-Doppler Signatures Multi-Decoder Dprnn: Source Separation for Variable Number of Speakers Topic-Aware Dialogue Generation with Two-Hop Based Graph Attention On The Accuracy Limit of Joint Time-Delay/Doppler/Acceleration Estimation with a Band-Limited Signal
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1