Three-dimensional grid-free sound source localization method based on deep learning

IF 3.4 2区 物理与天体物理 Q1 ACOUSTICS Applied Acoustics Pub Date : 2024-09-02 DOI:10.1016/j.apacoust.2024.110261
Yunjie Zhao, Yansong He, Hao Chen, Zhifei Zhang, Zhongming Xu
{"title":"Three-dimensional grid-free sound source localization method based on deep learning","authors":"Yunjie Zhao,&nbsp;Yansong He,&nbsp;Hao Chen,&nbsp;Zhifei Zhang,&nbsp;Zhongming Xu","doi":"10.1016/j.apacoust.2024.110261","DOIUrl":null,"url":null,"abstract":"<div><p>Sound source localization (SSL) technology is a popular method for identifying the locations of noise sources, which serves as a prerequisite for noise control. Deep learning, as a data-driven tool, shows broad perspectives in the field of SSL with its powerful nonlinear fitting ability. The existing deep learning-based SSL methods only provide a two-dimensional (2D) representation of the sound source location and cannot obtain the specific coordinates of the sound source in three-dimensional (3D) space. Although traditional beamforming methods can be directly generalized to 3D scenes in principle, they suffer from the limitations of insufficient vertical resolution and high computational cost. Therefore, a 3D grid-free SSL method (3DGF) informed by deep learning is suggested in this study to enhance the accuracy and computational efficiency of 3D localization. First, the number of data channels is compressed to respect limited memory resources during the training process. Subsequently, a dense convolutional neural network (DenseNet) model is utilized to obtain the 3D spatial coordinates of the sound source using the processed 3D beamforming map as input. Since the coordinates are continuous and are not constrained by the grid of the beamforming map, the grid-free strategy presents more accurate localization results. Then, the effects of the volume of training data and the compression ratio are analyzed, respectively, in simulation, and the localization performance with different signal-to-noise ratios (SNRs) is also tested. Finally, by comparing 3DGF with DAMAS, both simulation and experimental results demonstrate that 3DGF improves the accuracy and efficacy of 3D localization. Meanwhile, its satisfactory generalization ability and robustness against noise highlight its potential for practical applications.</p></div>","PeriodicalId":55506,"journal":{"name":"Applied Acoustics","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Acoustics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003682X24004122","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Sound source localization (SSL) technology is a popular method for identifying the locations of noise sources, which serves as a prerequisite for noise control. Deep learning, as a data-driven tool, shows broad perspectives in the field of SSL with its powerful nonlinear fitting ability. The existing deep learning-based SSL methods only provide a two-dimensional (2D) representation of the sound source location and cannot obtain the specific coordinates of the sound source in three-dimensional (3D) space. Although traditional beamforming methods can be directly generalized to 3D scenes in principle, they suffer from the limitations of insufficient vertical resolution and high computational cost. Therefore, a 3D grid-free SSL method (3DGF) informed by deep learning is suggested in this study to enhance the accuracy and computational efficiency of 3D localization. First, the number of data channels is compressed to respect limited memory resources during the training process. Subsequently, a dense convolutional neural network (DenseNet) model is utilized to obtain the 3D spatial coordinates of the sound source using the processed 3D beamforming map as input. Since the coordinates are continuous and are not constrained by the grid of the beamforming map, the grid-free strategy presents more accurate localization results. Then, the effects of the volume of training data and the compression ratio are analyzed, respectively, in simulation, and the localization performance with different signal-to-noise ratios (SNRs) is also tested. Finally, by comparing 3DGF with DAMAS, both simulation and experimental results demonstrate that 3DGF improves the accuracy and efficacy of 3D localization. Meanwhile, its satisfactory generalization ability and robustness against noise highlight its potential for practical applications.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于深度学习的三维无网格声源定位方法
声源定位(SSL)技术是一种识别噪声源位置的常用方法,是噪声控制的先决条件。深度学习作为一种数据驱动工具,凭借其强大的非线性拟合能力,在声源定位领域展现出广阔的前景。现有的基于深度学习的 SSL 方法只能提供声源位置的二维(2D)表示,无法获得声源在三维(3D)空间中的具体坐标。虽然传统的波束成形方法原则上可以直接应用于三维场景,但它们存在垂直分辨率不足和计算成本高等局限性。因此,本研究提出了一种基于深度学习的三维无网格 SSL 方法(3DGF),以提高三维定位的精度和计算效率。首先,在训练过程中压缩数据通道的数量,以尊重有限的内存资源。随后,利用密集卷积神经网络(DenseNet)模型,以处理后的三维波束成形图为输入,获取声源的三维空间坐标。由于坐标是连续的,不受波束成形图网格的限制,因此无网格策略能提供更精确的定位结果。然后,在仿真中分别分析了训练数据量和压缩比的影响,并测试了不同信噪比(SNR)下的定位性能。最后,通过比较 3DGF 和 DAMAS,仿真和实验结果都表明 3DGF 提高了三维定位的准确性和有效性。同时,3DGF 令人满意的泛化能力和对噪声的鲁棒性突显了其在实际应用中的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Applied Acoustics
Applied Acoustics 物理-声学
CiteScore
7.40
自引率
11.80%
发文量
618
审稿时长
7.5 months
期刊介绍: Since its launch in 1968, Applied Acoustics has been publishing high quality research papers providing state-of-the-art coverage of research findings for engineers and scientists involved in applications of acoustics in the widest sense. Applied Acoustics looks not only at recent developments in the understanding of acoustics but also at ways of exploiting that understanding. The Journal aims to encourage the exchange of practical experience through publication and in so doing creates a fund of technological information that can be used for solving related problems. The presentation of information in graphical or tabular form is especially encouraged. If a report of a mathematical development is a necessary part of a paper it is important to ensure that it is there only as an integral part of a practical solution to a problem and is supported by data. Applied Acoustics encourages the exchange of practical experience in the following ways: • Complete Papers • Short Technical Notes • Review Articles; and thereby provides a wealth of technological information that can be used to solve related problems. Manuscripts that address all fields of applications of acoustics ranging from medicine and NDT to the environment and buildings are welcome.
期刊最新文献
Fibonacci array-based temporal-spatial localization with neural networks Semi-analytical prediction of energy-based acoustical parameters in proscenium theatres Preparation and performance analysis of porous materials for road noise abatement using waste rubber tires Acoustic characteristics of whispered vowels: A dynamic feature exploration A high DOF and azimuth resolution beamforming via enhanced virtual aperture extension of joint linear prediction and inverse beamforming
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1