FA3-Net: feature aggregation and augmentation with attention network for sound event localization and detection

IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Applied Intelligence Pub Date : 2025-03-18 DOI:10.1007/s10489-025-06437-x
Chuan Wang, Qinghua Huang
{"title":"FA3-Net: feature aggregation and augmentation with attention network for sound event localization and detection","authors":"Chuan Wang,&nbsp;Qinghua Huang","doi":"10.1007/s10489-025-06437-x","DOIUrl":null,"url":null,"abstract":"<div><p>Sound event localization and detection (SELD) aims to identify the category and duration of sound events (SED) while also estimating their respective direction of arrival (DOA). This multi-task problem presents unique challenges, as the features required for SED and DOA tasks are not entirely aligned. Consequently, incomplete feature extraction and suboptimal feature fusion often hinder performance. To address these issues, we propose a feature aggregation and augmentation with attention network (FA3-Net). FA3-Net consists of two main components: the feature aggregation and augmentation with attention (FA3) module and the Conformer module. The FA3 module plays a critical role in fusing and enhancing high-level features, which is specifically designed to efficiently handle the distinct requirements of SED and DOA tasks. It ensures that task-specific features are extracted effectively, while also improving feature discriminability and reducing confusion. The feature aggregation residual block (FAResBlock), a component of the FA3 module, handles task-specific feature aggregation, while the feature augmentation with attention block (FAA block) enhances feature representation across multiple dimensions. The Conformer module is employed to model the temporal sequence, as it excels in capturing both local and global dependencies, making it ideal for comprehensive time sequence analysis. Finally, to overcome data limitations, audio channel swapping (ACS) is employed. Experiments on the STARSS23 dataset, DCASE2021 dataset and L3DAS22 dataset show that FA3-Net significantly outperforms other models in both feature aggregation and augmentation, while also being more efficient and lightweight. The code is available in: https://github.com/wangchuan11111111/FA3-NET</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06437-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Sound event localization and detection (SELD) aims to identify the category and duration of sound events (SED) while also estimating their respective direction of arrival (DOA). This multi-task problem presents unique challenges, as the features required for SED and DOA tasks are not entirely aligned. Consequently, incomplete feature extraction and suboptimal feature fusion often hinder performance. To address these issues, we propose a feature aggregation and augmentation with attention network (FA3-Net). FA3-Net consists of two main components: the feature aggregation and augmentation with attention (FA3) module and the Conformer module. The FA3 module plays a critical role in fusing and enhancing high-level features, which is specifically designed to efficiently handle the distinct requirements of SED and DOA tasks. It ensures that task-specific features are extracted effectively, while also improving feature discriminability and reducing confusion. The feature aggregation residual block (FAResBlock), a component of the FA3 module, handles task-specific feature aggregation, while the feature augmentation with attention block (FAA block) enhances feature representation across multiple dimensions. The Conformer module is employed to model the temporal sequence, as it excels in capturing both local and global dependencies, making it ideal for comprehensive time sequence analysis. Finally, to overcome data limitations, audio channel swapping (ACS) is employed. Experiments on the STARSS23 dataset, DCASE2021 dataset and L3DAS22 dataset show that FA3-Net significantly outperforms other models in both feature aggregation and augmentation, while also being more efficient and lightweight. The code is available in: https://github.com/wangchuan11111111/FA3-NET

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
声音事件定位和检测(SELD)旨在识别声音事件(SED)的类别和持续时间,同时估计其各自的到达方向(DOA)。这种多任务问题带来了独特的挑战,因为 SED 和 DOA 任务所需的特征并不完全一致。因此,不完整的特征提取和次优的特征融合往往会影响性能。为了解决这些问题,我们提出了一种特征聚合和注意力增强网络(FA3-Net)。FA3-Net 由两个主要部分组成:注意力特征聚合和增强(FA3)模块和 Conformer 模块。FA3 模块在融合和增强高级特征方面发挥着关键作用,它是专门为有效处理 SED 和 DOA 任务的不同要求而设计的。它能确保有效提取特定任务的特征,同时还能提高特征的可辨别性并减少混淆。特征聚合残差块(FAResBlock)是 FA3 模块的一个组件,用于处理特定任务的特征聚合,而注意力特征增强块(FAA 块)则用于增强跨多个维度的特征表示。Conformer 模块被用来建立时间序列模型,因为它在捕捉局部和全局依赖性方面表现出色,是进行综合时间序列分析的理想选择。最后,为了克服数据限制,还采用了音频信道交换(ACS)技术。在 STARSS23 数据集、DCASE2021 数据集和 L3DAS22 数据集上进行的实验表明,FA3-Net 在特征聚合和增强方面都明显优于其他模型,同时还更加高效和轻便。代码见: https://github.com/wangchuan11111111/FA3-NET
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Applied Intelligence
Applied Intelligence 工程技术-计算机:人工智能
CiteScore
6.60
自引率
20.80%
发文量
1361
审稿时长
5.9 months
期刊介绍: With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.
期刊最新文献
An approach to software defect prediction for small-sized datasets REFD:recurrent encoder and fusion decoder for temporal knowledge graph reasoning Semi-supervised text classification method based on three-way decision with evidence theory A new deep learning-based approach for predicting the geothermal heat pump’s thermal power of a real bioclimatic house Separable N-soft sets: A tool for multinary descriptions with large-scale parameter sets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1