信号潜在子空间:环境声音分类的新表征

IF 3.4 2区 物理与天体物理 Q1 ACOUSTICS Applied Acoustics Pub Date : 2024-07-25 DOI:10.1016/j.apacoust.2024.110181
Maha Mahyub , Lincon S. Souza , Bojan Batalo , Kazuhiro Fukui
{"title":"信号潜在子空间:环境声音分类的新表征","authors":"Maha Mahyub ,&nbsp;Lincon S. Souza ,&nbsp;Bojan Batalo ,&nbsp;Kazuhiro Fukui","doi":"10.1016/j.apacoust.2024.110181","DOIUrl":null,"url":null,"abstract":"<div><p>In this study, we propose Signal Latent Subspace (SLS), a flexible method that classifies environmental sound events using the subspace representations of latent features obtained from various neural network-based models. Our main goal is to leverage the high expressiveness of neural networks while retaining the advantages of subspace representation, such as its robustness to noise and ability to work under small sample size (SSS) conditions. We also propose an ensemble strategy native to the subspace representation, to achieve increased performance and reduce the generalization error. We do this through product Grassmann manifold (PGM), resulting in SLS-PGM. Each subspace constructed from latent features of a network can be seen as a point on a factor Grassmann manifold (GM) of a neural network; through PGM, it is possible to unify factor manifolds into a singular representation, and perform classification through a similarity metric on the manifold. We further improve SLS and SLS-PGM in two ways: (1) by using generalized difference subspace (GDS) projection to address the lack of between-class discrimination of subspace representation and (2) by leveraging finetuning regimes to better adapt neural network models to the ESC task. We evaluate our proposed methods, factoring various neural networks, on ESC-10, ESC-50 and UrbanSound environmental sound datasets, and provide extensive ablation experiments and notes for practical use.</p></div>","PeriodicalId":55506,"journal":{"name":"Applied Acoustics","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Signal latent subspace: A new representation for environmental sound classification\",\"authors\":\"Maha Mahyub ,&nbsp;Lincon S. Souza ,&nbsp;Bojan Batalo ,&nbsp;Kazuhiro Fukui\",\"doi\":\"10.1016/j.apacoust.2024.110181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In this study, we propose Signal Latent Subspace (SLS), a flexible method that classifies environmental sound events using the subspace representations of latent features obtained from various neural network-based models. Our main goal is to leverage the high expressiveness of neural networks while retaining the advantages of subspace representation, such as its robustness to noise and ability to work under small sample size (SSS) conditions. We also propose an ensemble strategy native to the subspace representation, to achieve increased performance and reduce the generalization error. We do this through product Grassmann manifold (PGM), resulting in SLS-PGM. Each subspace constructed from latent features of a network can be seen as a point on a factor Grassmann manifold (GM) of a neural network; through PGM, it is possible to unify factor manifolds into a singular representation, and perform classification through a similarity metric on the manifold. We further improve SLS and SLS-PGM in two ways: (1) by using generalized difference subspace (GDS) projection to address the lack of between-class discrimination of subspace representation and (2) by leveraging finetuning regimes to better adapt neural network models to the ESC task. We evaluate our proposed methods, factoring various neural networks, on ESC-10, ESC-50 and UrbanSound environmental sound datasets, and provide extensive ablation experiments and notes for practical use.</p></div>\",\"PeriodicalId\":55506,\"journal\":{\"name\":\"Applied Acoustics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Acoustics\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0003682X24003323\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Acoustics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003682X24003323","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

摘要

在本研究中,我们提出了信号潜在子空间(SLS),这是一种灵活的方法,可利用从各种基于神经网络的模型中获得的潜在特征的子空间表示对环境声音事件进行分类。我们的主要目标是利用神经网络的高表现力,同时保留子空间表示法的优势,如对噪声的鲁棒性和在小样本量(SSS)条件下工作的能力。我们还提出了一种子空间表示的集合策略,以提高性能并减少泛化误差。我们通过乘积格拉斯曼流形(PGM)来实现这一目标,这就是 SLS-PGM。根据网络潜在特征构建的每个子空间都可以看作是神经网络因子格拉斯曼流形(GM)上的一个点;通过 PGM,可以将因子流形统一为一个奇异表示,并通过流形上的相似性度量进行分类。我们从两个方面进一步改进了 SLS 和 SLS-PGM:(1) 使用广义差分子空间(GDS)投影来解决子空间表示缺乏类间区分的问题;(2) 利用微调机制使神经网络模型更好地适应 ESC 任务。我们在 ESC-10、ESC-50 和 UrbanSound 环境声音数据集上评估了我们提出的各种神经网络派生方法,并提供了广泛的消融实验和实际使用说明。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Signal latent subspace: A new representation for environmental sound classification

In this study, we propose Signal Latent Subspace (SLS), a flexible method that classifies environmental sound events using the subspace representations of latent features obtained from various neural network-based models. Our main goal is to leverage the high expressiveness of neural networks while retaining the advantages of subspace representation, such as its robustness to noise and ability to work under small sample size (SSS) conditions. We also propose an ensemble strategy native to the subspace representation, to achieve increased performance and reduce the generalization error. We do this through product Grassmann manifold (PGM), resulting in SLS-PGM. Each subspace constructed from latent features of a network can be seen as a point on a factor Grassmann manifold (GM) of a neural network; through PGM, it is possible to unify factor manifolds into a singular representation, and perform classification through a similarity metric on the manifold. We further improve SLS and SLS-PGM in two ways: (1) by using generalized difference subspace (GDS) projection to address the lack of between-class discrimination of subspace representation and (2) by leveraging finetuning regimes to better adapt neural network models to the ESC task. We evaluate our proposed methods, factoring various neural networks, on ESC-10, ESC-50 and UrbanSound environmental sound datasets, and provide extensive ablation experiments and notes for practical use.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Acoustics
Applied Acoustics 物理-声学
CiteScore
7.40
自引率
11.80%
发文量
618
审稿时长
7.5 months
期刊介绍: Since its launch in 1968, Applied Acoustics has been publishing high quality research papers providing state-of-the-art coverage of research findings for engineers and scientists involved in applications of acoustics in the widest sense. Applied Acoustics looks not only at recent developments in the understanding of acoustics but also at ways of exploiting that understanding. The Journal aims to encourage the exchange of practical experience through publication and in so doing creates a fund of technological information that can be used for solving related problems. The presentation of information in graphical or tabular form is especially encouraged. If a report of a mathematical development is a necessary part of a paper it is important to ensure that it is there only as an integral part of a practical solution to a problem and is supported by data. Applied Acoustics encourages the exchange of practical experience in the following ways: • Complete Papers • Short Technical Notes • Review Articles; and thereby provides a wealth of technological information that can be used to solve related problems. Manuscripts that address all fields of applications of acoustics ranging from medicine and NDT to the environment and buildings are welcome.
期刊最新文献
Fibonacci array-based temporal-spatial localization with neural networks Semi-analytical prediction of energy-based acoustical parameters in proscenium theatres Preparation and performance analysis of porous materials for road noise abatement using waste rubber tires Acoustic characteristics of whispered vowels: A dynamic feature exploration A high DOF and azimuth resolution beamforming via enhanced virtual aperture extension of joint linear prediction and inverse beamforming
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1