Polyphonic sound event detection using multi label deep neural networks

2015 International Joint Conference on Neural Networks (IJCNN) Pub Date : 2015-07-12 DOI:10.1109/IJCNN.2015.7280624

Emre Çakir, T. Heittola, H. Huttunen, T. Virtanen

引用次数: 269

Abstract

In this paper, the use of multi label neural networks are proposed for detection of temporally overlapping sound events in realistic environments. Real-life sound recordings typically have many overlapping sound events, making it hard to recognize each event with the standard sound event detection methods. Frame-wise spectral-domain features are used as inputs to train a deep neural network for multi label classification in this work. The model is evaluated with recordings from realistic everyday environments and the obtained overall accuracy is 63.8%. The method is compared against a state-of-the-art method using non-negative matrix factorization as a pre-processing stage and hidden Markov models as a classifier. The proposed method improves the accuracy by 19% percentage points overall.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于多标签深度神经网络的复音事件检测

本文提出使用多标签神经网络来检测现实环境中时间重叠的声音事件。现实生活中的录音通常有许多重叠的声音事件，因此很难用标准的声音事件检测方法识别每个事件。在这项工作中，使用帧频谱域特征作为输入来训练用于多标签分类的深度神经网络。用真实的日常环境记录对模型进行了评估，得到的总体准确率为63.8%。该方法与使用非负矩阵分解作为预处理阶段和隐马尔可夫模型作为分类器的最先进方法进行了比较。该方法总体上提高了19%的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 International Joint Conference on Neural Networks (IJCNN)

自引率

0.00%

发文量