用于声源定位的DNN鲁棒训练的HRTF聚类

IF 1.1 4区 工程技术 Q3 ACOUSTICS Journal of the Audio Engineering Society Pub Date : 2022-12-12 DOI:10.17743/jaes.2022.0051
Hugh O’Dwyer, F. Boland
{"title":"用于声源定位的DNN鲁棒训练的HRTF聚类","authors":"Hugh O’Dwyer, F. Boland","doi":"10.17743/jaes.2022.0051","DOIUrl":null,"url":null,"abstract":"This study shows how spherical sound source localization of binaural audio signals in the mismatchedhead-relatedtransferfunction(HRTF)conditioncanbeimprovedbyimplementing HRTF clustering when using machine learning. A new feature set of cross-correlation function, interaural level difference, and Gammatone cepstral coefficients is introduced and shown to outperform state-of-the-art methods in vertical localization in the mismatched HRTF condition by up to 5%. By examining the performance of Deep Neural Networks trained on single HRTF sets from the CIPIC database on other HRTFs, it is shown that HRTF sets can be clustered into groups of similar HRTFs. This results in the formulation of central HRTF sets representativeoftheirspecificcluster.BytrainingamachinelearningalgorithmonthesecentralHRTFs,itisshownthatamorerobustalgorithmcanbetrainedcapableofimprovingsound sourcelocalizationaccuracybyupto13%inthemismatchedHRTFcondition.Concurrently,localizationaccuracyisdecreasedbyapproximately6%inthematchedHRTFcondition,which accountsforlessthan9%ofalltestconditions.ResultsdemonstratethatHRTFclusteringcanvastlyimprovetherobustnessofbinauralsoundsourcelocalizationtounseenHRTFconditions.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HRTF Clustering for Robust Training of a DNN for Sound Source Localization\",\"authors\":\"Hugh O’Dwyer, F. Boland\",\"doi\":\"10.17743/jaes.2022.0051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study shows how spherical sound source localization of binaural audio signals in the mismatchedhead-relatedtransferfunction(HRTF)conditioncanbeimprovedbyimplementing HRTF clustering when using machine learning. A new feature set of cross-correlation function, interaural level difference, and Gammatone cepstral coefficients is introduced and shown to outperform state-of-the-art methods in vertical localization in the mismatched HRTF condition by up to 5%. By examining the performance of Deep Neural Networks trained on single HRTF sets from the CIPIC database on other HRTFs, it is shown that HRTF sets can be clustered into groups of similar HRTFs. This results in the formulation of central HRTF sets representativeoftheirspecificcluster.BytrainingamachinelearningalgorithmonthesecentralHRTFs,itisshownthatamorerobustalgorithmcanbetrainedcapableofimprovingsound sourcelocalizationaccuracybyupto13%inthemismatchedHRTFcondition.Concurrently,localizationaccuracyisdecreasedbyapproximately6%inthematchedHRTFcondition,which accountsforlessthan9%ofalltestconditions.ResultsdemonstratethatHRTFclusteringcanvastlyimprovetherobustnessofbinauralsoundsourcelocalizationtounseenHRTFconditions.\",\"PeriodicalId\":50008,\"journal\":{\"name\":\"Journal of the Audio Engineering Society\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2022-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Audio Engineering Society\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.17743/jaes.2022.0051\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Audio Engineering Society","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.17743/jaes.2022.0051","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

摘要

本研究表明,在使用机器学习时,如何通过实施HRTF聚类来改善在不匹配的头部相关传递函数(HRTF)条件下双耳音频信号的球形声源定位。引入了一个新的互相关函数、耳间水平差和伽玛酮倒谱系数的特征集,并表明在不匹配的HRTF条件下,该特征集在垂直定位方面优于最先进的方法高达5%。通过检查在来自CIPIC数据库的单个HRTF集上训练的深度神经网络在其他HRTF上的性能,表明HRTF集可以聚类为相似的HRTF组。这导致了具有特定聚类代表性的中心HRTF集合的公式化。通过对这些中心HRTF的机器学习算法进行训练,可以得出结论,在匹配的HRTF条件下,可以训练出一种更完善的算法,能够将声源定位精度提高13%。同时,在预定的HRTF情况下,定位精度降低约6%,结果表明,HRTF聚类可以极大地提高声源定位在HRTF条件下的可信度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HRTF Clustering for Robust Training of a DNN for Sound Source Localization
This study shows how spherical sound source localization of binaural audio signals in the mismatchedhead-relatedtransferfunction(HRTF)conditioncanbeimprovedbyimplementing HRTF clustering when using machine learning. A new feature set of cross-correlation function, interaural level difference, and Gammatone cepstral coefficients is introduced and shown to outperform state-of-the-art methods in vertical localization in the mismatched HRTF condition by up to 5%. By examining the performance of Deep Neural Networks trained on single HRTF sets from the CIPIC database on other HRTFs, it is shown that HRTF sets can be clustered into groups of similar HRTFs. This results in the formulation of central HRTF sets representativeoftheirspecificcluster.BytrainingamachinelearningalgorithmonthesecentralHRTFs,itisshownthatamorerobustalgorithmcanbetrainedcapableofimprovingsound sourcelocalizationaccuracybyupto13%inthemismatchedHRTFcondition.Concurrently,localizationaccuracyisdecreasedbyapproximately6%inthematchedHRTFcondition,which accountsforlessthan9%ofalltestconditions.ResultsdemonstratethatHRTFclusteringcanvastlyimprovetherobustnessofbinauralsoundsourcelocalizationtounseenHRTFconditions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of the Audio Engineering Society
Journal of the Audio Engineering Society 工程技术-工程:综合
CiteScore
3.50
自引率
14.30%
发文量
53
审稿时长
1 months
期刊介绍: The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers. The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work; membership news, patents, new products, and newsworthy developments in the field of audio.
期刊最新文献
Perceptual Comparison of Dynamic Binaural Reproduction Methods for Sparse Head-Mounted Microphone Arrays Effects of Torso Location and Rotation to HRTF Practical Implementation of Automated Next Generation Audio Production for Live Sports Singer and Audience Evaluations of a Networked Immersive Audio Concert Spatial Sampling of Binaural Room Transfer Functions for Head-Tracked Personal Sound Zones
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1