Three-dimensional grid-free sound source localization method based on deep learning

IF 3.4 2区物理与天体物理 Q1 ACOUSTICS Applied Acoustics Pub Date : 2024-09-02 DOI:10.1016/j.apacoust.2024.110261

Yunjie Zhao, Yansong He, Hao Chen, Zhifei Zhang, Zhongming Xu

{"title":"Three-dimensional grid-free sound source localization method based on deep learning","authors":"Yunjie Zhao, Yansong He, Hao Chen, Zhifei Zhang, Zhongming Xu","doi":"10.1016/j.apacoust.2024.110261","DOIUrl":null,"url":null,"abstract":"<div><p>Sound source localization (SSL) technology is a popular method for identifying the locations of noise sources, which serves as a prerequisite for noise control. Deep learning, as a data-driven tool, shows broad perspectives in the field of SSL with its powerful nonlinear fitting ability. The existing deep learning-based SSL methods only provide a two-dimensional (2D) representation of the sound source location and cannot obtain the specific coordinates of the sound source in three-dimensional (3D) space. Although traditional beamforming methods can be directly generalized to 3D scenes in principle, they suffer from the limitations of insufficient vertical resolution and high computational cost. Therefore, a 3D grid-free SSL method (3DGF) informed by deep learning is suggested in this study to enhance the accuracy and computational efficiency of 3D localization. First, the number of data channels is compressed to respect limited memory resources during the training process. Subsequently, a dense convolutional neural network (DenseNet) model is utilized to obtain the 3D spatial coordinates of the sound source using the processed 3D beamforming map as input. Since the coordinates are continuous and are not constrained by the grid of the beamforming map, the grid-free strategy presents more accurate localization results. Then, the effects of the volume of training data and the compression ratio are analyzed, respectively, in simulation, and the localization performance with different signal-to-noise ratios (SNRs) is also tested. Finally, by comparing 3DGF with DAMAS, both simulation and experimental results demonstrate that 3DGF improves the accuracy and efficacy of 3D localization. Meanwhile, its satisfactory generalization ability and robustness against noise highlight its potential for practical applications.</p></div>","PeriodicalId":55506,"journal":{"name":"Applied Acoustics","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Acoustics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003682X24004122","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Sound source localization (SSL) technology is a popular method for identifying the locations of noise sources, which serves as a prerequisite for noise control. Deep learning, as a data-driven tool, shows broad perspectives in the field of SSL with its powerful nonlinear fitting ability. The existing deep learning-based SSL methods only provide a two-dimensional (2D) representation of the sound source location and cannot obtain the specific coordinates of the sound source in three-dimensional (3D) space. Although traditional beamforming methods can be directly generalized to 3D scenes in principle, they suffer from the limitations of insufficient vertical resolution and high computational cost. Therefore, a 3D grid-free SSL method (3DGF) informed by deep learning is suggested in this study to enhance the accuracy and computational efficiency of 3D localization. First, the number of data channels is compressed to respect limited memory resources during the training process. Subsequently, a dense convolutional neural network (DenseNet) model is utilized to obtain the 3D spatial coordinates of the sound source using the processed 3D beamforming map as input. Since the coordinates are continuous and are not constrained by the grid of the beamforming map, the grid-free strategy presents more accurate localization results. Then, the effects of the volume of training data and the compression ratio are analyzed, respectively, in simulation, and the localization performance with different signal-to-noise ratios (SNRs) is also tested. Finally, by comparing 3DGF with DAMAS, both simulation and experimental results demonstrate that 3DGF improves the accuracy and efficacy of 3D localization. Meanwhile, its satisfactory generalization ability and robustness against noise highlight its potential for practical applications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度学习的三维无网格声源定位方法

声源定位（SSL）技术是一种识别噪声源位置的常用方法，是噪声控制的先决条件。深度学习作为一种数据驱动工具，凭借其强大的非线性拟合能力，在声源定位领域展现出广阔的前景。现有的基于深度学习的 SSL 方法只能提供声源位置的二维（2D）表示，无法获得声源在三维（3D）空间中的具体坐标。虽然传统的波束成形方法原则上可以直接应用于三维场景，但它们存在垂直分辨率不足和计算成本高等局限性。因此，本研究提出了一种基于深度学习的三维无网格 SSL 方法（3DGF），以提高三维定位的精度和计算效率。首先，在训练过程中压缩数据通道的数量，以尊重有限的内存资源。随后，利用密集卷积神经网络（DenseNet）模型，以处理后的三维波束成形图为输入，获取声源的三维空间坐标。由于坐标是连续的，不受波束成形图网格的限制，因此无网格策略能提供更精确的定位结果。然后，在仿真中分别分析了训练数据量和压缩比的影响，并测试了不同信噪比（SNR）下的定位性能。最后，通过比较 3DGF 和 DAMAS，仿真和实验结果都表明 3DGF 提高了三维定位的准确性和有效性。同时，3DGF 令人满意的泛化能力和对噪声的鲁棒性突显了其在实际应用中的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Applied Acoustics 物理-声学

CiteScore

7.40

自引率

11.80%

发文量

618

审稿时长

7.5 months

期刊介绍： Since its launch in 1968, Applied Acoustics has been publishing high quality research papers providing state-of-the-art coverage of research findings for engineers and scientists involved in applications of acoustics in the widest sense. Applied Acoustics looks not only at recent developments in the understanding of acoustics but also at ways of exploiting that understanding. The Journal aims to encourage the exchange of practical experience through publication and in so doing creates a fund of technological information that can be used for solving related problems. The presentation of information in graphical or tabular form is especially encouraged. If a report of a mathematical development is a necessary part of a paper it is important to ensure that it is there only as an integral part of a practical solution to a problem and is supported by data. Applied Acoustics encourages the exchange of practical experience in the following ways: • Complete Papers • Short Technical Notes • Review Articles; and thereby provides a wealth of technological information that can be used to solve related problems. Manuscripts that address all fields of applications of acoustics ranging from medicine and NDT to the environment and buildings are welcome.