{"title":"Influence of the Relative Height of a Dome-Shaped Diaphragm on the Directivity of a Spherical-Enclosure Loudspeaker","authors":"Zhichao Zhang, Guang-zheng Yu, Linda Liang","doi":"10.17743/jaes.2022.0064","DOIUrl":"https://doi.org/10.17743/jaes.2022.0064","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48242075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-16DOI: 10.17743/jaes.2022.0062/
Peet Hickman
{"title":"Analysis of Löfgren’s Tonearm Optimization","authors":"Peet Hickman","doi":"10.17743/jaes.2022.0062/","DOIUrl":"https://doi.org/10.17743/jaes.2022.0062/","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46117909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Perceptual evaluation of immersive audiovisual quality is often very labor-intensive and costly because numerous factors and factor levels are included in the experimental design. Therefore, the present study aims to reduce the required experimental effort by investigating the effectiveness of optimal experimental design (OED) compared to classical full factorial design (FFD) in the study using compressed omnidirectional video and ambisonic audio as examples. An FFD experiment was conducted and the results were used to simulate 12 OEDs consisting of D-optimal and I-optimal designs varying with replication and additional data points. The fraction of design space plot and the effect test based on the ordinary least-squares model were evaluated, and four OEDs were selected for a series of laboratory experiments. After demonstrating an insignificant difference between the simulation and experimental data, this study also showed that the differences in model performance between the experimental OEDs and FFD were insignificant, except for some interacting factors in the effect test. Finally, the performance of the I-optimal design with replicated points was shown to outperform that of the other designs. The results presented in this study open new possibilities for assessing perceptual quality in a much
{"title":"Comparison of Full Factorial and Optimal Experimental Design for Perceptual Evaluation of Audiovisual Quality","authors":"R. F. Fela, N. Zacharov, Søren Forchhammer","doi":"10.17743/jaes.2022.0063","DOIUrl":"https://doi.org/10.17743/jaes.2022.0063","url":null,"abstract":"Perceptual evaluation of immersive audiovisual quality is often very labor-intensive and costly because numerous factors and factor levels are included in the experimental design. Therefore, the present study aims to reduce the required experimental effort by investigating the effectiveness of optimal experimental design (OED) compared to classical full factorial design (FFD) in the study using compressed omnidirectional video and ambisonic audio as examples. An FFD experiment was conducted and the results were used to simulate 12 OEDs consisting of D-optimal and I-optimal designs varying with replication and additional data points. The fraction of design space plot and the effect test based on the ordinary least-squares model were evaluated, and four OEDs were selected for a series of laboratory experiments. After demonstrating an insignificant difference between the simulation and experimental data, this study also showed that the differences in model performance between the experimental OEDs and FFD were insignificant, except for some interacting factors in the effect test. Finally, the performance of the I-optimal design with replicated points was shown to outperform that of the other designs. The results presented in this study open new possibilities for assessing perceptual quality in a much","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48046787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Ackermann, Julian Domann, F. Brinkmann, Johannes M. Arend, Martin Schneider, C. Pörschmann, Stefan Weinzier
For live broadcasting of speech, music, or other audio content, multichannel microphone array recordings of the sound field can be used to render and stream dynamic binaural signals in real time. For a comparative physical and perceptual evaluation of conceptually different binaural rendering techniques, recordings are needed in which all other factors affecting the sound (such as the sound radiation of the sources, the room acoustic environment, and the recording position) are kept constant. To provide such a recording, the sound field of an 18-channel loudspeaker orchestra fed by anechoic recordings of a chamber orchestra was captured in two rooms with nine different receivers. In addition, impulse responses were recorded for each sound source and receiver. The anechoic audio signals, the full loudspeaker orchestra recordings, and all measured impulse responses are available with open access in the Spatially Oriented Format for Acoustics (SOFA 2.1, AES69-2022) format. The article presents the recording process and processing chain as well as the structure of the generated database.
{"title":"Recordings of a Loudspeaker Orchestra With Multichannel Microphone Arrays for the Evaluation of Spatial Audio Methods","authors":"David Ackermann, Julian Domann, F. Brinkmann, Johannes M. Arend, Martin Schneider, C. Pörschmann, Stefan Weinzier","doi":"10.17743/jaes.2022.0059","DOIUrl":"https://doi.org/10.17743/jaes.2022.0059","url":null,"abstract":"For live broadcasting of speech, music, or other audio content, multichannel microphone array recordings of the sound field can be used to render and stream dynamic binaural signals in real time. For a comparative physical and perceptual evaluation of conceptually different binaural rendering techniques, recordings are needed in which all other factors affecting the sound (such as the sound radiation of the sources, the room acoustic environment, and the recording position) are kept constant. To provide such a recording, the sound field of an 18-channel loudspeaker orchestra fed by anechoic recordings of a chamber orchestra was captured in two rooms with nine different receivers. In addition, impulse responses were recorded for each sound source and receiver. The anechoic audio signals, the full loudspeaker orchestra recordings, and all measured impulse responses are available with open access in the Spatially Oriented Format for Acoustics (SOFA 2.1, AES69-2022) format. The article presents the recording process and processing chain as well as the structure of the generated database.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49505084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Watkins Woofer","authors":"Sébastien Degraeve, J. Oclee-Brown","doi":"10.17743/jaes.2022.0045","DOIUrl":"https://doi.org/10.17743/jaes.2022.0045","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43560513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal Microphone Placement for Single-Channel Sound-Power Spectrum Estimation and Reverberation Effects","authors":"Samuel D Bellows, T. Leishman","doi":"10.17743/jaes.2022.0052","DOIUrl":"https://doi.org/10.17743/jaes.2022.0052","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48848377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating Web Audio for Learning, Accessibility, and Distribution","authors":"Hans Lindetorp, Kjetil Falkenberg","doi":"10.17743/jaes.2022.0031","DOIUrl":"https://doi.org/10.17743/jaes.2022.0031","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48796608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automatic music transcription with note level output is a current task in the field of music information retrieval. In contrast to the piano case with very good results using available large datasets, transcription of non-professional singing has been rarely investigated with deep learning approaches because of the lack of note level annotated datasets. In this work, two datasets are created concerning amateur singing recordings, one for training (synthetic singing dataset) and one for the evaluation task (SingReal dataset). The synthetic training dataset is generated by synthesizing a large scale of vocal melodies from artificial songs. Because the evaluation should represent a realistic scenario, the SingReal dataset is created from real recordings of non-professional singers. To transcribe singing notes, a new method called Dual Task Monophonic Singing Transcription is proposed, which divides the problem of singing transcription into the two subtasks onset detection and pitch estimation, realized by two small independent neural networks. This approach achieves a note level F1 score of 74.19% on the SingReal dataset, outperforming all state of the art transcription systems investigated with at least 3.5% improvement. Furthermore, Dual Task Monophonic Singing Transcription can be adapted very easily to the real-time transcription case.
{"title":"Dual Task Monophonic Singing Transcription","authors":"Markus Schwabe, Sebastian Murgul, M. Heizmann","doi":"10.17743/jaes.2022.0040","DOIUrl":"https://doi.org/10.17743/jaes.2022.0040","url":null,"abstract":"Automatic music transcription with note level output is a current task in the field of music information retrieval. In contrast to the piano case with very good results using available large datasets, transcription of non-professional singing has been rarely investigated with deep learning approaches because of the lack of note level annotated datasets. In this work, two datasets are created concerning amateur singing recordings, one for training (synthetic singing dataset) and one for the evaluation task (SingReal dataset). The synthetic training dataset is generated by synthesizing a large scale of vocal melodies from artificial songs. Because the evaluation should represent a realistic scenario, the SingReal dataset is created from real recordings of non-professional singers. To transcribe singing notes, a new method called Dual Task Monophonic Singing Transcription is proposed, which divides the problem of singing transcription into the two subtasks onset detection and pitch estimation, realized by two small independent neural networks. This approach achieves a note level F1 score of 74.19% on the SingReal dataset, outperforming all state of the art transcription systems investigated with at least 3.5% improvement. Furthermore, Dual Task Monophonic Singing Transcription can be adapted very easily to the real-time transcription case.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41469511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The microphones and loudspeakers of modern compact electronic devices such as smartphones and tablets typically require case penetrations that leave the device vulnerable to environmental damage. To address this, the authors propose a surface-based audio interface that employs force actuators for reproduction and structural vibration sensors to record the vibrations of the display panel induced by incident acoustic waves. This paper reports experimental results showing that recorded speech signals are of sufficient quality to enable high-reliability automatic speech recognition despite degradation by the panel’s resonant properties. The authors report the results of experiments in which acoustic waves containing speech were directed to several panels, and the subsequent vibrations of the panels’ surfaces were recorded using structural sensors. The recording quality was characterized by measuring the speech transmission index, and the recordings were transcribed to text using an automatic speech recognition system from which the resulting word error rate was determined. Experiments showed that the word error rate (10%–13%) achieved for the audio signals recorded by the method described in this paper was comparable to that for audio captured by a high-quality studio microphone (10%). The authors also demonstrated a crosstalk cancellation method that enables the system to simultaneously record and play audio signals.
{"title":"Audio Capture Using Structural Sensors on Vibrating Panel Surfaces","authors":"Tre Dipassio, Michael C. Heilemann, M. Bocko","doi":"10.17743/jaes.2022.0049","DOIUrl":"https://doi.org/10.17743/jaes.2022.0049","url":null,"abstract":"The microphones and loudspeakers of modern compact electronic devices such as smartphones and tablets typically require case penetrations that leave the device vulnerable to environmental damage. To address this, the authors propose a surface-based audio interface that employs force actuators for reproduction and structural vibration sensors to record the vibrations of the display panel induced by incident acoustic waves. This paper reports experimental results showing that recorded speech signals are of sufficient quality to enable high-reliability automatic speech recognition despite degradation by the panel’s resonant properties. The authors report the results of experiments in which acoustic waves containing speech were directed to several panels, and the subsequent vibrations of the panels’ surfaces were recorded using structural sensors. The recording quality was characterized by measuring the speech transmission index, and the recordings were transcribed to text using an automatic speech recognition system from which the resulting word error rate was determined. Experiments showed that the word error rate (10%–13%) achieved for the audio signals recorded by the method described in this paper was comparable to that for audio captured by a high-quality studio microphone (10%). The authors also demonstrated a crosstalk cancellation method that enables the system to simultaneously record and play audio signals.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42936239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study shows how spherical sound source localization of binaural audio signals in the mismatchedhead-relatedtransferfunction(HRTF)conditioncanbeimprovedbyimplementing HRTF clustering when using machine learning. A new feature set of cross-correlation function, interaural level difference, and Gammatone cepstral coefficients is introduced and shown to outperform state-of-the-art methods in vertical localization in the mismatched HRTF condition by up to 5%. By examining the performance of Deep Neural Networks trained on single HRTF sets from the CIPIC database on other HRTFs, it is shown that HRTF sets can be clustered into groups of similar HRTFs. This results in the formulation of central HRTF sets representativeoftheirspecificcluster.BytrainingamachinelearningalgorithmonthesecentralHRTFs,itisshownthatamorerobustalgorithmcanbetrainedcapableofimprovingsound sourcelocalizationaccuracybyupto13%inthemismatchedHRTFcondition.Concurrently,localizationaccuracyisdecreasedbyapproximately6%inthematchedHRTFcondition,which accountsforlessthan9%ofalltestconditions.ResultsdemonstratethatHRTFclusteringcanvastlyimprovetherobustnessofbinauralsoundsourcelocalizationtounseenHRTFconditions.
{"title":"HRTF Clustering for Robust Training of a DNN for Sound Source Localization","authors":"Hugh O’Dwyer, F. Boland","doi":"10.17743/jaes.2022.0051","DOIUrl":"https://doi.org/10.17743/jaes.2022.0051","url":null,"abstract":"This study shows how spherical sound source localization of binaural audio signals in the mismatchedhead-relatedtransferfunction(HRTF)conditioncanbeimprovedbyimplementing HRTF clustering when using machine learning. A new feature set of cross-correlation function, interaural level difference, and Gammatone cepstral coefficients is introduced and shown to outperform state-of-the-art methods in vertical localization in the mismatched HRTF condition by up to 5%. By examining the performance of Deep Neural Networks trained on single HRTF sets from the CIPIC database on other HRTFs, it is shown that HRTF sets can be clustered into groups of similar HRTFs. This results in the formulation of central HRTF sets representativeoftheirspecificcluster.BytrainingamachinelearningalgorithmonthesecentralHRTFs,itisshownthatamorerobustalgorithmcanbetrainedcapableofimprovingsound sourcelocalizationaccuracybyupto13%inthemismatchedHRTFcondition.Concurrently,localizationaccuracyisdecreasedbyapproximately6%inthematchedHRTFcondition,which accountsforlessthan9%ofalltestconditions.ResultsdemonstratethatHRTFclusteringcanvastlyimprovetherobustnessofbinauralsoundsourcelocalizationtounseenHRTFconditions.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49622444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}