HRTF Clustering for Robust Training of a DNN for Sound Source Localization

IF 1.1 4区工程技术 Q3 ACOUSTICS Journal of the Audio Engineering Society Pub Date : 2022-12-12 DOI:10.17743/jaes.2022.0051

Hugh O’Dwyer, F. Boland

{"title":"HRTF Clustering for Robust Training of a DNN for Sound Source Localization","authors":"Hugh O’Dwyer, F. Boland","doi":"10.17743/jaes.2022.0051","DOIUrl":null,"url":null,"abstract":"This study shows how spherical sound source localization of binaural audio signals in the mismatchedhead-relatedtransferfunction(HRTF)conditioncanbeimprovedbyimplementing HRTF clustering when using machine learning. A new feature set of cross-correlation function, interaural level difference, and Gammatone cepstral coefficients is introduced and shown to outperform state-of-the-art methods in vertical localization in the mismatched HRTF condition by up to 5%. By examining the performance of Deep Neural Networks trained on single HRTF sets from the CIPIC database on other HRTFs, it is shown that HRTF sets can be clustered into groups of similar HRTFs. This results in the formulation of central HRTF sets representativeoftheirspecificcluster.BytrainingamachinelearningalgorithmonthesecentralHRTFs,itisshownthatamorerobustalgorithmcanbetrainedcapableofimprovingsound sourcelocalizationaccuracybyupto13%inthemismatchedHRTFcondition.Concurrently,localizationaccuracyisdecreasedbyapproximately6%inthematchedHRTFcondition,which accountsforlessthan9%ofalltestconditions.ResultsdemonstratethatHRTFclusteringcanvastlyimprovetherobustnessofbinauralsoundsourcelocalizationtounseenHRTFconditions.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Audio Engineering Society","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.17743/jaes.2022.0051","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

Abstract

This study shows how spherical sound source localization of binaural audio signals in the mismatchedhead-relatedtransferfunction(HRTF)conditioncanbeimprovedbyimplementing HRTF clustering when using machine learning. A new feature set of cross-correlation function, interaural level difference, and Gammatone cepstral coefficients is introduced and shown to outperform state-of-the-art methods in vertical localization in the mismatched HRTF condition by up to 5%. By examining the performance of Deep Neural Networks trained on single HRTF sets from the CIPIC database on other HRTFs, it is shown that HRTF sets can be clustered into groups of similar HRTFs. This results in the formulation of central HRTF sets representativeoftheirspecificcluster.BytrainingamachinelearningalgorithmonthesecentralHRTFs,itisshownthatamorerobustalgorithmcanbetrainedcapableofimprovingsound sourcelocalizationaccuracybyupto13%inthemismatchedHRTFcondition.Concurrently,localizationaccuracyisdecreasedbyapproximately6%inthematchedHRTFcondition,which accountsforlessthan9%ofalltestconditions.ResultsdemonstratethatHRTFclusteringcanvastlyimprovetherobustnessofbinauralsoundsourcelocalizationtounseenHRTFconditions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于声源定位的DNN鲁棒训练的HRTF聚类

本研究表明，在使用机器学习时，如何通过实施HRTF聚类来改善在不匹配的头部相关传递函数（HRTF）条件下双耳音频信号的球形声源定位。引入了一个新的互相关函数、耳间水平差和伽玛酮倒谱系数的特征集，并表明在不匹配的HRTF条件下，该特征集在垂直定位方面优于最先进的方法高达5%。通过检查在来自CIPIC数据库的单个HRTF集上训练的深度神经网络在其他HRTF上的性能，表明HRTF集可以聚类为相似的HRTF组。这导致了具有特定聚类代表性的中心HRTF集合的公式化。通过对这些中心HRTF的机器学习算法进行训练，可以得出结论，在匹配的HRTF条件下，可以训练出一种更完善的算法，能够将声源定位精度提高13%。同时，在预定的HRTF情况下，定位精度降低约6%，结果表明，HRTF聚类可以极大地提高声源定位在HRTF条件下的可信度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of the Audio Engineering Society 工程技术-工程：综合

CiteScore

3.50

自引率

14.30%

发文量

审稿时长

1 months

期刊介绍： The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers. The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work; membership news, patents, new products, and newsworthy developments in the field of audio.

期刊最新文献

Distributing Generative Music With Alternator Orchestra: A Toolbox for Live Music Performances in a Web-Based Metaverse Hack the Show: Design and Analysis of Three Interaction Modes for Audience Participation Rocking the Web With Browser-Based Simulations of Tube Guitar Amplifiers The Web Audio API as a Standardized Interface Beyond Web Browsers