MPEG-I Immersive Audio is a forthcoming standard that is under development within the MPEG Audio group (ISO/IEC JTC1/SC29/WG6) to provide a compressed representation and rendering of audio for Virtual and Augmented Reality (VR/AR) applications with six degrees of freedom (6DoF). MPEG-I Immersive Audio supports bitrate-efficient and high-quality storage/transmission of complex virtual scenes including sources with spatial extent and distinct radiation characteristics (like musical instruments) as well as geometry description of acoustically relevant elements (e.g., walls, doors, occluders). The rendering process includes detailed modeling of room acoustics and complex acoustic phenomena such as occlusion and diffraction due to acoustic obstacles and Doppler effects as well as interactivity with the user. Based on many contributions, this paper reports on the state of the MPEG-I Immersive Audio standardization process and its first technical Reference Model architecture. MPEG-I Immersive Audio establishes the first long-term stable audio format specification in the field of VR/AR and can be used for many consumer applications such as broadcasting, streaming, social VR/AR, or Metaverse technology.
{"title":"MPEG-I Immersive Audio – Reference Model For The Virtual/Augmented Reality Audio Standard","authors":"J. Herre, S. Disch","doi":"10.17743/jaes.2022.0074","DOIUrl":"https://doi.org/10.17743/jaes.2022.0074","url":null,"abstract":"MPEG-I Immersive Audio is a forthcoming standard that is under development within the MPEG Audio group (ISO/IEC JTC1/SC29/WG6) to provide a compressed representation and rendering of audio for Virtual and Augmented Reality (VR/AR) applications with six degrees of freedom (6DoF). MPEG-I Immersive Audio supports bitrate-efficient and high-quality storage/transmission of complex virtual scenes including sources with spatial extent and distinct radiation characteristics (like musical instruments) as well as geometry description of acoustically relevant elements (e.g., walls, doors, occluders). The rendering process includes detailed modeling of room acoustics and complex acoustic phenomena such as occlusion and diffraction due to acoustic obstacles and Doppler effects as well as interactivity with the user. Based on many contributions, this paper reports on the state of the MPEG-I Immersive Audio standardization process and its first technical Reference Model architecture. MPEG-I Immersive Audio establishes the first long-term stable audio format specification in the field of VR/AR and can be used for many consumer applications such as broadcasting, streaming, social VR/AR, or Metaverse technology.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42417044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Binaural Rendering of Spatially Extended Sound Sources","authors":"Carlotta Anemüller, Alexander Adami, J. Herre","doi":"10.17743/jaes.2022.0069","DOIUrl":"https://doi.org/10.17743/jaes.2022.0069","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48454693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work proposes a parametric model for just noticeable differences of unilateral differences in head-related transfer functions (HRTFs). For seven generic magnitude-based distance metrics, common trends in their response to inter-individual and intra-individual HRTF differences are analyzed, identifying metric subgroups with pseudo-orthogonal behavior. On the basis of three representative metrics, a three-alternative forced-choice experiment is conducted, and the acquired discrimination probabilities are set in relation with distance metrics via different modeling approaches. A linear model, with coefficients based on principal component analysis and three distance metrics as input, yields the best performance, compared to a simple multi-linear regression approach or to principal component analysis–based models of higher complexity.
{"title":"A Magnitude-Based Parametric Model Predicting the Audibility of HRTF Variation","authors":"S. Doma, Cosima A. Ermert, J. Fels","doi":"10.17743/jaes.2022.0080","DOIUrl":"https://doi.org/10.17743/jaes.2022.0080","url":null,"abstract":"This work proposes a parametric model for just noticeable differences of unilateral differences in head-related transfer functions (HRTFs). For seven generic magnitude-based distance metrics, common trends in their response to inter-individual and intra-individual HRTF differences are analyzed, identifying metric subgroups with pseudo-orthogonal behavior. On the basis of three representative metrics, a three-alternative forced-choice experiment is conducted, and the acquired discrimination probabilities are set in relation with distance metrics via different modeling approaches. A linear model, with coefficients based on principal component analysis and three distance metrics as input, yields the best performance, compared to a simple multi-linear regression approach or to principal component analysis–based models of higher complexity.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49634559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper discusses the evaluation of Opus-compressed Ambisonic audio content through listeningtestsconductedinavirtualrealityenvironment.Theaimofthisstudywastoinvestigate theeffectthatOpuscompressionhasontheBasicAudioQuality(BAQ)ofAmbisonicaudioindifferentvirtualrealitycontexts—gaming,music,soundscapes,andteleconferencing.The methodsusedtoproducethetestcontent,howthetestswereconducted,theresultsobtainedandtheirsignificancearediscussed.Keyfindingswerethatinallcases,Ambisonicscenes compressedwithOpusat64kbps/chusingChannelMappingFamily3garneredamedianBAQratingnotsignificantlydifferentthanuncompressedaudio.ChannelMappingFamily3 demonstratedtheleastvariationinBAQacrossevaluatedcontexts,althoughtherewerestillsomesignificantdifferencesfoundbetweencontextsatcertainbitratesandAmbisonicorders.
{"title":"Context-Based Evaluation of the Opus Audio Codec for Spatial Audio Content in Virtual Reality","authors":"Ben Lee, Tomasz Rudzki, J. Skoglund, G. Kearney","doi":"10.17743/jaes.2022.0068","DOIUrl":"https://doi.org/10.17743/jaes.2022.0068","url":null,"abstract":"This paper discusses the evaluation of Opus-compressed Ambisonic audio content through listeningtestsconductedinavirtualrealityenvironment.Theaimofthisstudywastoinvestigate theeffectthatOpuscompressionhasontheBasicAudioQuality(BAQ)ofAmbisonicaudioindifferentvirtualrealitycontexts—gaming,music,soundscapes,andteleconferencing.The methodsusedtoproducethetestcontent,howthetestswereconducted,theresultsobtainedandtheirsignificancearediscussed.Keyfindingswerethatinallcases,Ambisonicscenes compressedwithOpusat64kbps/chusingChannelMappingFamily3garneredamedianBAQratingnotsignificantlydifferentthanuncompressedaudio.ChannelMappingFamily3 demonstratedtheleastvariationinBAQacrossevaluatedcontexts,althoughtherewerestillsomesignificantdifferencesfoundbetweencontextsatcertainbitratesandAmbisonicorders.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49567582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Kirsch, T. Wendt, S. van de Par, Hongmei Hu, S. Ewert
For computational efficiency, acoustic simulation of late reverberation can be simplified by generating a limited number of incoherent signals with frequency-dependent exponential decay radiated by spatially distributed virtual reverberation sources (VRS). A sufficient number of VRS and adequate spatial mapping are required to approximate spatially anisotropic late reverberation, e.g., in rooms with inhomogeneous distribution of absorption or for coupled volumes. For coupled rooms, moreover, a dual-slope decay might be required. Here, an efficient and perceptually plausible method to generate and spatially render late reverberation is suggested. Incoherent VRS signals for (sub-) volumes are generated based on room dimensions and frequency-dependent absorption coefficients at the boundaries. For coupled rooms, (acoustic) portals account for effects of sound propagation and diffraction at the room connection and energy transfer during the reverberant decay process. The VRS are spatially distributed around the listener, with weighting factors representing the spatially subsampled distribution of absorption on the boundaries and the location and solid angle covered by portals. A technical evaluation and listening tests demonstrate the validity of the approach in comparison to measurements in real rooms.
{"title":"Computationally-Efficient Simulation of Late Reverberation for Inhomogeneous Boundary Conditions and Coupled Rooms","authors":"C. Kirsch, T. Wendt, S. van de Par, Hongmei Hu, S. Ewert","doi":"10.17743/jaes.2022.0053","DOIUrl":"https://doi.org/10.17743/jaes.2022.0053","url":null,"abstract":"For computational efficiency, acoustic simulation of late reverberation can be simplified by generating a limited number of incoherent signals with frequency-dependent exponential decay radiated by spatially distributed virtual reverberation sources (VRS). A sufficient number of VRS and adequate spatial mapping are required to approximate spatially anisotropic late reverberation, e.g., in rooms with inhomogeneous distribution of absorption or for coupled volumes. For coupled rooms, moreover, a dual-slope decay might be required. Here, an efficient and perceptually plausible method to generate and spatially render late reverberation is suggested. Incoherent VRS signals for (sub-) volumes are generated based on room dimensions and frequency-dependent absorption coefficients at the boundaries. For coupled rooms, (acoustic) portals account for effects of sound propagation and diffraction at the room connection and energy transfer during the reverberant decay process. The VRS are spatially distributed around the listener, with weighting factors representing the spatially subsampled distribution of absorption on the boundaries and the location and solid angle covered by portals. A technical evaluation and listening tests demonstrate the validity of the approach in comparison to measurements in real rooms.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46550341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of Transaural Configurations Inside Usual Rooms","authors":"A. Vidal, P. Herzog, C. Lambourg, Jacques Chatron","doi":"10.17743/jaes.2022.0055","DOIUrl":"https://doi.org/10.17743/jaes.2022.0055","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43301522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Weighted Pressure and Mode Matching for Sound Field Reproduction: Theoretical and Experimental Comparisons","authors":"Shoichi Koyama, Keisuke Kimura, Natsuki Ueno","doi":"10.17743/jaes.2022.0058","DOIUrl":"https://doi.org/10.17743/jaes.2022.0058","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135861172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Giant FFTs for Sample-Rate Conversion","authors":"V. Välimäki, S. Bilbao","doi":"10.17743/jaes.2022.0061","DOIUrl":"https://doi.org/10.17743/jaes.2022.0061","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43138090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Active Voice Amplifier: On-Device Noisy Environment-Aware Solution for Dialogue Enhancement in Real Time","authors":"Jaeyoun Cho, Sunmin Kim, Inwoo Hwang","doi":"10.17743/jaes.2022.0065","DOIUrl":"https://doi.org/10.17743/jaes.2022.0065","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43341834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wen-Hsing Lai, T. Chou, Meng-Chen Chou, B. Schuller
{"title":"Robust Audio Watermarking Based on Empirical Mode Decomposition and Group Differential Relations","authors":"Wen-Hsing Lai, T. Chou, Meng-Chen Chou, B. Schuller","doi":"10.17743/jaes.2022.0067","DOIUrl":"https://doi.org/10.17743/jaes.2022.0067","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44721140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}