Leveraging deep neural networks with nonnegative representations for improved environmental sound classification

Victor Bisot, R. Serizel, S. Essid, G. Richard
{"title":"Leveraging deep neural networks with nonnegative representations for improved environmental sound classification","authors":"Victor Bisot, R. Serizel, S. Essid, G. Richard","doi":"10.1109/MLSP.2017.8168139","DOIUrl":null,"url":null,"abstract":"This paper introduces the use of representations based on nonnegative matrix factorization (NMF) to train deep neural networks with applications to environmental sound classification. Deep learning systems for sound classification usually rely on the network to learn meaningful representations from spectrograms or hand-crafted features. Instead, we introduce a NMF-based feature learning stage before training deep networks, whose usefulness is highlighted in this paper, especially for multi-source acoustic environments such as sound scenes. We rely on two established unsupervised and supervised NMF techniques to learn better input representations for deep neural networks. This will allow us, with simple architectures, to reach competitive performance with more complex systems such as convolutional networks for acoustic scene classification. The proposed systems outperform neural networks trained on time-frequency representations on two acoustic scene classification datasets as well as the best systems from the 2016 DCASE challenge.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"67 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLSP.2017.8168139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

This paper introduces the use of representations based on nonnegative matrix factorization (NMF) to train deep neural networks with applications to environmental sound classification. Deep learning systems for sound classification usually rely on the network to learn meaningful representations from spectrograms or hand-crafted features. Instead, we introduce a NMF-based feature learning stage before training deep networks, whose usefulness is highlighted in this paper, especially for multi-source acoustic environments such as sound scenes. We rely on two established unsupervised and supervised NMF techniques to learn better input representations for deep neural networks. This will allow us, with simple architectures, to reach competitive performance with more complex systems such as convolutional networks for acoustic scene classification. The proposed systems outperform neural networks trained on time-frequency representations on two acoustic scene classification datasets as well as the best systems from the 2016 DCASE challenge.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用非负表示的深度神经网络改进环境声音分类
本文介绍了利用基于非负矩阵分解(NMF)的表征方法训练深度神经网络,并将其应用于环境声音分类。用于声音分类的深度学习系统通常依赖于网络从频谱图或手工特征中学习有意义的表示。相反,我们在训练深度网络之前引入了一个基于nmf的特征学习阶段,本文强调了其实用性,特别是对于多源声环境(如声音场景)。我们依靠两种已建立的无监督和有监督NMF技术来学习深度神经网络的更好的输入表示。这将使我们能够使用简单的架构,与更复杂的系统(如用于声学场景分类的卷积网络)达到竞争性能。所提出的系统在两个声学场景分类数据集上的表现优于经过时频表示训练的神经网络,以及2016年DCASE挑战赛中的最佳系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Classical quadrature rules via Gaussian processes Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR Differential mutual information forward search for multi-kernel discriminant-component selection with an application to privacy-preserving classification Partitioning in signal processing using the object migration automaton and the pursuit paradigm Inferring room semantics using acoustic monitoring
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1