首页 > 最新文献

Journal of the Audio Engineering Society最新文献

英文 中文
Influence of the Relative Height of a Dome-Shaped Diaphragm on the Directivity of a Spherical-Enclosure Loudspeaker 圆顶膜片相对高度对球罩扬声器指向性的影响
IF 1.4 4区 工程技术 Q3 ACOUSTICS Pub Date : 2023-03-10 DOI: 10.17743/jaes.2022.0064
Zhichao Zhang, Guang-zheng Yu, Linda Liang
{"title":"Influence of the Relative Height of a Dome-Shaped Diaphragm on the Directivity of a Spherical-Enclosure Loudspeaker","authors":"Zhichao Zhang, Guang-zheng Yu, Linda Liang","doi":"10.17743/jaes.2022.0064","DOIUrl":"https://doi.org/10.17743/jaes.2022.0064","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48242075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of Löfgren’s Tonearm Optimization Löfgren的优化分析
IF 1.4 4区 工程技术 Q3 ACOUSTICS Pub Date : 2023-01-16 DOI: 10.17743/jaes.2022.0062/
Peet Hickman
{"title":"Analysis of Löfgren’s Tonearm Optimization","authors":"Peet Hickman","doi":"10.17743/jaes.2022.0062/","DOIUrl":"https://doi.org/10.17743/jaes.2022.0062/","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46117909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison of Full Factorial and Optimal Experimental Design for Perceptual Evaluation of Audiovisual Quality 视听质量感知评价的全因子与最优实验设计的比较
IF 1.4 4区 工程技术 Q3 ACOUSTICS Pub Date : 2023-01-16 DOI: 10.17743/jaes.2022.0063
R. F. Fela, N. Zacharov, Søren Forchhammer
Perceptual evaluation of immersive audiovisual quality is often very labor-intensive and costly because numerous factors and factor levels are included in the experimental design. Therefore, the present study aims to reduce the required experimental effort by investigating the effectiveness of optimal experimental design (OED) compared to classical full factorial design (FFD) in the study using compressed omnidirectional video and ambisonic audio as examples. An FFD experiment was conducted and the results were used to simulate 12 OEDs consisting of D-optimal and I-optimal designs varying with replication and additional data points. The fraction of design space plot and the effect test based on the ordinary least-squares model were evaluated, and four OEDs were selected for a series of laboratory experiments. After demonstrating an insignificant difference between the simulation and experimental data, this study also showed that the differences in model performance between the experimental OEDs and FFD were insignificant, except for some interacting factors in the effect test. Finally, the performance of the I-optimal design with replicated points was shown to outperform that of the other designs. The results presented in this study open new possibilities for assessing perceptual quality in a much
沉浸式视听质量的感知评价通常是非常劳动密集型和昂贵的,因为在实验设计中包含了许多因素和因素水平。因此,本研究旨在通过研究优化实验设计(OED)与经典全因子设计(FFD)的有效性来减少所需的实验工作量,并以压缩全向视频和双声音频为例。进行了FFD实验,并利用实验结果模拟了12种由d -最优和i -最优设计组成的oed,这些设计随复制和附加数据点的变化而变化。对设计空间图的比例和基于普通最小二乘模型的效果检验进行了评价,并选择了4个oed进行了一系列的室内实验。在证明了仿真数据与实验数据之间的差异不显著之后,本研究还表明,除了效应测试中的一些相互作用因素外,实验oed与FFD之间的模型性能差异不显著。最后,具有重复点的i -最优设计的性能优于其他设计。在这项研究中提出的结果开辟了新的可能性,以评估感知质量在一个多
{"title":"Comparison of Full Factorial and Optimal Experimental Design for Perceptual Evaluation of Audiovisual Quality","authors":"R. F. Fela, N. Zacharov, Søren Forchhammer","doi":"10.17743/jaes.2022.0063","DOIUrl":"https://doi.org/10.17743/jaes.2022.0063","url":null,"abstract":"Perceptual evaluation of immersive audiovisual quality is often very labor-intensive and costly because numerous factors and factor levels are included in the experimental design. Therefore, the present study aims to reduce the required experimental effort by investigating the effectiveness of optimal experimental design (OED) compared to classical full factorial design (FFD) in the study using compressed omnidirectional video and ambisonic audio as examples. An FFD experiment was conducted and the results were used to simulate 12 OEDs consisting of D-optimal and I-optimal designs varying with replication and additional data points. The fraction of design space plot and the effect test based on the ordinary least-squares model were evaluated, and four OEDs were selected for a series of laboratory experiments. After demonstrating an insignificant difference between the simulation and experimental data, this study also showed that the differences in model performance between the experimental OEDs and FFD were insignificant, except for some interacting factors in the effect test. Finally, the performance of the I-optimal design with replicated points was shown to outperform that of the other designs. The results presented in this study open new possibilities for assessing perceptual quality in a much","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48046787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Recordings of a Loudspeaker Orchestra With Multichannel Microphone Arrays for the Evaluation of Spatial Audio Methods 用于评估空间音频方法的多声道麦克风阵列扬声器管弦乐队的录音
IF 1.4 4区 工程技术 Q3 ACOUSTICS Pub Date : 2023-01-16 DOI: 10.17743/jaes.2022.0059
David Ackermann, Julian Domann, F. Brinkmann, Johannes M. Arend, Martin Schneider, C. Pörschmann, Stefan Weinzier
For live broadcasting of speech, music, or other audio content, multichannel microphone array recordings of the sound field can be used to render and stream dynamic binaural signals in real time. For a comparative physical and perceptual evaluation of conceptually different binaural rendering techniques, recordings are needed in which all other factors affecting the sound (such as the sound radiation of the sources, the room acoustic environment, and the recording position) are kept constant. To provide such a recording, the sound field of an 18-channel loudspeaker orchestra fed by anechoic recordings of a chamber orchestra was captured in two rooms with nine different receivers. In addition, impulse responses were recorded for each sound source and receiver. The anechoic audio signals, the full loudspeaker orchestra recordings, and all measured impulse responses are available with open access in the Spatially Oriented Format for Acoustics (SOFA 2.1, AES69-2022) format. The article presents the recording process and processing chain as well as the structure of the generated database.
对于语音、音乐或其他音频内容的直播,声场的多通道麦克风阵列记录可用于实时呈现和流式传输动态双耳信号。为了对概念上不同的双耳渲染技术进行比较物理和感知评估,需要将影响声音的所有其他因素(如声源的声音辐射、房间声学环境和录音位置)保持不变的录音。为了提供这样的录音,在两个房间里用九个不同的接收器捕捉了一个18声道扬声器管弦乐队的声场,该扬声器管弦乐队由室内管弦乐队的消声录音提供。此外,还记录了每个声源和接收器的脉冲响应。消声音频信号、全扬声器管弦乐队录音和所有测量的脉冲响应都可以以面向空间的声学格式(SOFA 2.1,AES69-2022)格式开放访问。文章介绍了记录过程、处理链以及生成数据库的结构。
{"title":"Recordings of a Loudspeaker Orchestra With Multichannel Microphone Arrays for the Evaluation of Spatial Audio Methods","authors":"David Ackermann, Julian Domann, F. Brinkmann, Johannes M. Arend, Martin Schneider, C. Pörschmann, Stefan Weinzier","doi":"10.17743/jaes.2022.0059","DOIUrl":"https://doi.org/10.17743/jaes.2022.0059","url":null,"abstract":"For live broadcasting of speech, music, or other audio content, multichannel microphone array recordings of the sound field can be used to render and stream dynamic binaural signals in real time. For a comparative physical and perceptual evaluation of conceptually different binaural rendering techniques, recordings are needed in which all other factors affecting the sound (such as the sound radiation of the sources, the room acoustic environment, and the recording position) are kept constant. To provide such a recording, the sound field of an 18-channel loudspeaker orchestra fed by anechoic recordings of a chamber orchestra was captured in two rooms with nine different receivers. In addition, impulse responses were recorded for each sound source and receiver. The anechoic audio signals, the full loudspeaker orchestra recordings, and all measured impulse responses are available with open access in the Spatially Oriented Format for Acoustics (SOFA 2.1, AES69-2022) format. The article presents the recording process and processing chain as well as the structure of the generated database.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49505084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Watkins Woofer 沃特金斯低音扬声器
IF 1.4 4区 工程技术 Q3 ACOUSTICS Pub Date : 2023-01-16 DOI: 10.17743/jaes.2022.0045
Sébastien Degraeve, J. Oclee-Brown
{"title":"The Watkins Woofer","authors":"Sébastien Degraeve, J. Oclee-Brown","doi":"10.17743/jaes.2022.0045","DOIUrl":"https://doi.org/10.17743/jaes.2022.0045","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43560513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Microphone Placement for Single-Channel Sound-Power Spectrum Estimation and Reverberation Effects 用于单声道声功率谱估计和混响效果的最佳麦克风布置
IF 1.4 4区 工程技术 Q3 ACOUSTICS Pub Date : 2023-01-16 DOI: 10.17743/jaes.2022.0052
Samuel D Bellows, T. Leishman
{"title":"Optimal Microphone Placement for Single-Channel Sound-Power Spectrum Estimation and Reverberation Effects","authors":"Samuel D Bellows, T. Leishman","doi":"10.17743/jaes.2022.0052","DOIUrl":"https://doi.org/10.17743/jaes.2022.0052","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48848377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating Web Audio for Learning, Accessibility, and Distribution 评估网络音频的学习、可访问性和分布
IF 1.4 4区 工程技术 Q3 ACOUSTICS Pub Date : 2022-12-12 DOI: 10.17743/jaes.2022.0031
Hans Lindetorp, Kjetil Falkenberg
{"title":"Evaluating Web Audio for Learning, Accessibility, and Distribution","authors":"Hans Lindetorp, Kjetil Falkenberg","doi":"10.17743/jaes.2022.0031","DOIUrl":"https://doi.org/10.17743/jaes.2022.0031","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48796608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Dual Task Monophonic Singing Transcription 双任务单音歌唱转录
IF 1.4 4区 工程技术 Q3 ACOUSTICS Pub Date : 2022-12-12 DOI: 10.17743/jaes.2022.0040
Markus Schwabe, Sebastian Murgul, M. Heizmann
Automatic music transcription with note level output is a current task in the field of music information retrieval. In contrast to the piano case with very good results using available large datasets, transcription of non-professional singing has been rarely investigated with deep learning approaches because of the lack of note level annotated datasets. In this work, two datasets are created concerning amateur singing recordings, one for training (synthetic singing dataset) and one for the evaluation task (SingReal dataset). The synthetic training dataset is generated by synthesizing a large scale of vocal melodies from artificial songs. Because the evaluation should represent a realistic scenario, the SingReal dataset is created from real recordings of non-professional singers. To transcribe singing notes, a new method called Dual Task Monophonic Singing Transcription is proposed, which divides the problem of singing transcription into the two subtasks onset detection and pitch estimation, realized by two small independent neural networks. This approach achieves a note level F1 score of 74.19% on the SingReal dataset, outperforming all state of the art transcription systems investigated with at least 3.5% improvement. Furthermore, Dual Task Monophonic Singing Transcription can be adapted very easily to the real-time transcription case.
具有音符级输出的自动音乐转录是音乐信息检索领域中的一项当前任务。与使用可用的大型数据集获得非常好结果的钢琴案例相比,由于缺乏音符级注释数据集,很少使用深度学习方法研究非专业歌唱的转录。在这项工作中,创建了两个关于业余歌唱记录的数据集,一个用于训练(合成歌唱数据集),另一个用于评估任务(SingReal数据集)。合成训练数据集是通过从人工歌曲中合成大规模的声乐旋律来生成的。因为评估应该代表一个现实的场景,所以SingReal数据集是根据非专业歌手的真实录音创建的。为了转录歌唱音符,提出了一种新的方法,称为双任务单音歌唱转录,该方法将歌唱转录问题分为两个子任务起始检测和音高估计,由两个小型独立神经网络实现。这种方法在SingReal数据集上获得了74.19%的音符级F1分数,优于所研究的所有最先进的转录系统,至少提高了3.5%。此外,双任务单音歌唱转录可以很容易地适应实时转录的情况。
{"title":"Dual Task Monophonic Singing Transcription","authors":"Markus Schwabe, Sebastian Murgul, M. Heizmann","doi":"10.17743/jaes.2022.0040","DOIUrl":"https://doi.org/10.17743/jaes.2022.0040","url":null,"abstract":"Automatic music transcription with note level output is a current task in the field of music information retrieval. In contrast to the piano case with very good results using available large datasets, transcription of non-professional singing has been rarely investigated with deep learning approaches because of the lack of note level annotated datasets. In this work, two datasets are created concerning amateur singing recordings, one for training (synthetic singing dataset) and one for the evaluation task (SingReal dataset). The synthetic training dataset is generated by synthesizing a large scale of vocal melodies from artificial songs. Because the evaluation should represent a realistic scenario, the SingReal dataset is created from real recordings of non-professional singers. To transcribe singing notes, a new method called Dual Task Monophonic Singing Transcription is proposed, which divides the problem of singing transcription into the two subtasks onset detection and pitch estimation, realized by two small independent neural networks. This approach achieves a note level F1 score of 74.19% on the SingReal dataset, outperforming all state of the art transcription systems investigated with at least 3.5% improvement. Furthermore, Dual Task Monophonic Singing Transcription can be adapted very easily to the real-time transcription case.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41469511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Audio Capture Using Structural Sensors on Vibrating Panel Surfaces 在振动面板表面使用结构传感器进行音频捕获
IF 1.4 4区 工程技术 Q3 ACOUSTICS Pub Date : 2022-12-12 DOI: 10.17743/jaes.2022.0049
Tre Dipassio, Michael C. Heilemann, M. Bocko
The microphones and loudspeakers of modern compact electronic devices such as smartphones and tablets typically require case penetrations that leave the device vulnerable to environmental damage. To address this, the authors propose a surface-based audio interface that employs force actuators for reproduction and structural vibration sensors to record the vibrations of the display panel induced by incident acoustic waves. This paper reports experimental results showing that recorded speech signals are of sufficient quality to enable high-reliability automatic speech recognition despite degradation by the panel’s resonant properties. The authors report the results of experiments in which acoustic waves containing speech were directed to several panels, and the subsequent vibrations of the panels’ surfaces were recorded using structural sensors. The recording quality was characterized by measuring the speech transmission index, and the recordings were transcribed to text using an automatic speech recognition system from which the resulting word error rate was determined. Experiments showed that the word error rate (10%–13%) achieved for the audio signals recorded by the method described in this paper was comparable to that for audio captured by a high-quality studio microphone (10%). The authors also demonstrated a crosstalk cancellation method that enables the system to simultaneously record and play audio signals.
智能手机和平板电脑等现代紧凑型电子设备的麦克风和扬声器通常需要穿透外壳,这会使设备容易受到环境破坏。为了解决这一问题,作者提出了一种基于表面的音频接口,该接口使用用于再现的力致动器和结构振动传感器来记录由入射声波引起的显示面板的振动。本文报道了实验结果,表明记录的语音信号具有足够的质量,能够实现高可靠性的自动语音识别,尽管面板的谐振特性会降低。作者报告了将包含语音的声波引导到几个面板的实验结果,并使用结构传感器记录面板表面随后的振动。通过测量语音传输指数来表征记录质量,并使用自动语音识别系统将记录转录为文本,由此确定产生的单词错误率。实验表明,用本文所述方法记录的音频信号的单词错误率(10%-13%)与用高质量录音室麦克风捕获的音频的错误率(10%)相当。作者还演示了一种串扰消除方法,使系统能够同时录制和播放音频信号。
{"title":"Audio Capture Using Structural Sensors on Vibrating Panel Surfaces","authors":"Tre Dipassio, Michael C. Heilemann, M. Bocko","doi":"10.17743/jaes.2022.0049","DOIUrl":"https://doi.org/10.17743/jaes.2022.0049","url":null,"abstract":"The microphones and loudspeakers of modern compact electronic devices such as smartphones and tablets typically require case penetrations that leave the device vulnerable to environmental damage. To address this, the authors propose a surface-based audio interface that employs force actuators for reproduction and structural vibration sensors to record the vibrations of the display panel induced by incident acoustic waves. This paper reports experimental results showing that recorded speech signals are of sufficient quality to enable high-reliability automatic speech recognition despite degradation by the panel’s resonant properties. The authors report the results of experiments in which acoustic waves containing speech were directed to several panels, and the subsequent vibrations of the panels’ surfaces were recorded using structural sensors. The recording quality was characterized by measuring the speech transmission index, and the recordings were transcribed to text using an automatic speech recognition system from which the resulting word error rate was determined. Experiments showed that the word error rate (10%–13%) achieved for the audio signals recorded by the method described in this paper was comparable to that for audio captured by a high-quality studio microphone (10%). The authors also demonstrated a crosstalk cancellation method that enables the system to simultaneously record and play audio signals.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42936239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
HRTF Clustering for Robust Training of a DNN for Sound Source Localization 用于声源定位的DNN鲁棒训练的HRTF聚类
IF 1.4 4区 工程技术 Q3 ACOUSTICS Pub Date : 2022-12-12 DOI: 10.17743/jaes.2022.0051
Hugh O’Dwyer, F. Boland
This study shows how spherical sound source localization of binaural audio signals in the mismatchedhead-relatedtransferfunction(HRTF)conditioncanbeimprovedbyimplementing HRTF clustering when using machine learning. A new feature set of cross-correlation function, interaural level difference, and Gammatone cepstral coefficients is introduced and shown to outperform state-of-the-art methods in vertical localization in the mismatched HRTF condition by up to 5%. By examining the performance of Deep Neural Networks trained on single HRTF sets from the CIPIC database on other HRTFs, it is shown that HRTF sets can be clustered into groups of similar HRTFs. This results in the formulation of central HRTF sets representativeoftheirspecificcluster.BytrainingamachinelearningalgorithmonthesecentralHRTFs,itisshownthatamorerobustalgorithmcanbetrainedcapableofimprovingsound sourcelocalizationaccuracybyupto13%inthemismatchedHRTFcondition.Concurrently,localizationaccuracyisdecreasedbyapproximately6%inthematchedHRTFcondition,which accountsforlessthan9%ofalltestconditions.ResultsdemonstratethatHRTFclusteringcanvastlyimprovetherobustnessofbinauralsoundsourcelocalizationtounseenHRTFconditions.
本研究表明,在使用机器学习时,如何通过实施HRTF聚类来改善在不匹配的头部相关传递函数(HRTF)条件下双耳音频信号的球形声源定位。引入了一个新的互相关函数、耳间水平差和伽玛酮倒谱系数的特征集,并表明在不匹配的HRTF条件下,该特征集在垂直定位方面优于最先进的方法高达5%。通过检查在来自CIPIC数据库的单个HRTF集上训练的深度神经网络在其他HRTF上的性能,表明HRTF集可以聚类为相似的HRTF组。这导致了具有特定聚类代表性的中心HRTF集合的公式化。通过对这些中心HRTF的机器学习算法进行训练,可以得出结论,在匹配的HRTF条件下,可以训练出一种更完善的算法,能够将声源定位精度提高13%。同时,在预定的HRTF情况下,定位精度降低约6%,结果表明,HRTF聚类可以极大地提高声源定位在HRTF条件下的可信度。
{"title":"HRTF Clustering for Robust Training of a DNN for Sound Source Localization","authors":"Hugh O’Dwyer, F. Boland","doi":"10.17743/jaes.2022.0051","DOIUrl":"https://doi.org/10.17743/jaes.2022.0051","url":null,"abstract":"This study shows how spherical sound source localization of binaural audio signals in the mismatchedhead-relatedtransferfunction(HRTF)conditioncanbeimprovedbyimplementing HRTF clustering when using machine learning. A new feature set of cross-correlation function, interaural level difference, and Gammatone cepstral coefficients is introduced and shown to outperform state-of-the-art methods in vertical localization in the mismatched HRTF condition by up to 5%. By examining the performance of Deep Neural Networks trained on single HRTF sets from the CIPIC database on other HRTFs, it is shown that HRTF sets can be clustered into groups of similar HRTFs. This results in the formulation of central HRTF sets representativeoftheirspecificcluster.BytrainingamachinelearningalgorithmonthesecentralHRTFs,itisshownthatamorerobustalgorithmcanbetrainedcapableofimprovingsound sourcelocalizationaccuracybyupto13%inthemismatchedHRTFcondition.Concurrently,localizationaccuracyisdecreasedbyapproximately6%inthematchedHRTFcondition,which accountsforlessthan9%ofalltestconditions.ResultsdemonstratethatHRTFclusteringcanvastlyimprovetherobustnessofbinauralsoundsourcelocalizationtounseenHRTFconditions.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49622444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the Audio Engineering Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1