Location-invariant representations for acoustic scene classification

2022 30th European Signal Processing Conference (EUSIPCO) Pub Date : 2022-08-29 DOI:10.23919/eusipco55093.2022.9909672

Akansha Tyagi, Padmanabhan Rajan

引用次数: 0

Abstract

High intra-class variance is one of the significant challenges in solving the problem of acoustic scene classification. This work identifies the recording location (or city) of an audio sample as a source of intra-class variation. We overcome this variation by utilising multi-view learning, where each recording location is considered as a view. Canonical correlation analysis (CCA) based multi-view algorithms learn a subspace where samples from the same class are brought together, and samples from different classes are moved apart, irrespective of the views. By considering cities as views, and by using several variants of CCA algorithms, we show that intra-class variation can be reduced, and location-invariant representations can be learnt. The proposed method demonstrates an improvement of more than 8% on the DCASE 2018 and 2019 datasets, when compared to not using the view information.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

声学场景分类的位置不变表示

高类内方差是解决声场景分类问题的重要挑战之一。这项工作将音频样本的录制位置(或城市)确定为类内变化的来源。我们通过利用多视图学习来克服这种差异，其中每个记录位置都被视为一个视图。基于典型相关分析(CCA)的多视图算法学习一个子空间，其中来自同一类的样本被聚集在一起，来自不同类的样本被分开，而与视图无关。通过将城市视为视图，并使用CCA算法的几种变体，我们表明可以减少类内变化，并且可以学习位置不变表示。与不使用视图信息相比，所提出的方法在DCASE 2018和2019数据集上的改进幅度超过8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 30th European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量