Jaan Johansson, A. Mäkivirta, Matti Malinen, Ville Saari
This paper studies the feasibility of predicting the interaural time difference (ITD) in azimuth and elevation once the personal anthropometric interaural distance is known, proposing an enhancement for spherical head ITD models to increase their accuracy. The method and enhancement are developed using data in a Head-Related Impulse Response (HRIR) data set comprising photogrammetrically obtained personal 3D geometries for 170 persons and then evaluated using three acoustically measured HRIR data sets containing 119 persons in total. The directions include 360 ◦ in azimuth and –15 ◦ to 60 ◦ in elevation. The prediction error for each data set is described, the proportion of persons under a given error in all studied directions is shown, and the directions in which large errors occur are analyzed. The enhanced spherical head model can predict the ITD such that the first and 99th percentile levels of the ITD prediction error for all persons and in all directions remains below 122 μ s. The anthropometric interaural distance could potentially be measured directly on a person, enabling personalized ITD without measuring the HRIR. The enhanced model can personalize ITD in binaural rendering for headphone reproduction in games and immersive audio applications.
{"title":"Interaural Time Difference Prediction Using Anthropometric Interaural Distance","authors":"Jaan Johansson, A. Mäkivirta, Matti Malinen, Ville Saari","doi":"10.17743/jaes.2022.0038","DOIUrl":"https://doi.org/10.17743/jaes.2022.0038","url":null,"abstract":"This paper studies the feasibility of predicting the interaural time difference (ITD) in azimuth and elevation once the personal anthropometric interaural distance is known, proposing an enhancement for spherical head ITD models to increase their accuracy. The method and enhancement are developed using data in a Head-Related Impulse Response (HRIR) data set comprising photogrammetrically obtained personal 3D geometries for 170 persons and then evaluated using three acoustically measured HRIR data sets containing 119 persons in total. The directions include 360 ◦ in azimuth and –15 ◦ to 60 ◦ in elevation. The prediction error for each data set is described, the proportion of persons under a given error in all studied directions is shown, and the directions in which large errors occur are analyzed. The enhanced spherical head model can predict the ITD such that the first and 99th percentile levels of the ITD prediction error for all persons and in all directions remains below 122 μ s. The anthropometric interaural distance could potentially be measured directly on a person, enabling personalized ITD without measuring the HRIR. The enhanced model can personalize ITD in binaural rendering for headphone reproduction in games and immersive audio applications.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44774269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several well-established approaches to physical modeling synthesis for musical instruments exist. Finite-difference time-domain methods are known for their generality and flexibility in terms of the systems one can model but are less flexible with regard to smooth parameter variations due to their reliance on a static grid. This paper presents the dynamic grid, a method to smoothly change grid configurations of finite-difference time-domain schemes based on sub- audio–rate time variation of parameters. This allows for extensions of the behavior of physical models beyond the physically possible, broadening the range of expressive possibilities for the musician. The method is applied to the 1D wave equation, the stiff string, and 2D systems, including the 2D wave equation and thin plate. Results show that the method does not introduce noticeable artefacts when changing between grid configurations for systems, including loss.
{"title":"The Dynamic Grid: Time-Varying Parameters for Musical Instrument Simulations Based on Finite-Difference Time-Domain Schemes","authors":"S. Willemsen, S. Bilbao, M. Ducceschi, S. Serafin","doi":"10.17743/jaes.2022.0043","DOIUrl":"https://doi.org/10.17743/jaes.2022.0043","url":null,"abstract":"Several well-established approaches to physical modeling synthesis for musical instruments exist. Finite-difference time-domain methods are known for their generality and flexibility in terms of the systems one can model but are less flexible with regard to smooth parameter variations due to their reliance on a static grid. This paper presents the dynamic grid, a method to smoothly change grid configurations of finite-difference time-domain schemes based on sub- audio–rate time variation of parameters. This allows for extensions of the behavior of physical models beyond the physically possible, broadening the range of expressive possibilities for the musician. The method is applied to the 1D wave equation, the stiff string, and 2D systems, including the 2D wave equation and thin plate. Results show that the method does not introduce noticeable artefacts when changing between grid configurations for systems, including loss.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47822785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessor Selection Process for Perceptual Quality Evaluation of 360 Audiovisual Content","authors":"R. F. Fela, N. Zacharov, Søren Forchhammer","doi":"10.17743/jaes.2022.0037","DOIUrl":"https://doi.org/10.17743/jaes.2022.0037","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43326674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Fast Local Sparsity Method: A Low-Cost Combination of Time-Frequency Representations Based on the Hoyer Sparsity","authors":"M. D. V. M. da Costa, L. Biscainho","doi":"10.17743/jaes.2022.0036","DOIUrl":"https://doi.org/10.17743/jaes.2022.0036","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48004979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nyquist Band Transform: An Order-Preserving Transform for Bandlimited Discretization","authors":"Champ C. Darabundit, J. Abel, D. Berners","doi":"10.17743/jaes.2022.0044","DOIUrl":"https://doi.org/10.17743/jaes.2022.0044","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47336490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparative Study of Music Mastered by Human Engineers and Automated Services","authors":"Mitchell Elliott, S. Chon","doi":"10.17743/jaes.2022.0050","DOIUrl":"https://doi.org/10.17743/jaes.2022.0050","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44890164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Woosung Choi, Yeong-Seok Jeong, Jinsung Kim, Jaehwa Chung, Soonyoung Jung, J. Reiss
Label-conditioned source separation extracts the target source, specified by an input symbol, from an input mixture track. A recently proposed label-conditioned source separation model called Latent Source Attentive Frequency Transformation (LaSAFT)–Gated Point-Wise Con- volutional Modulation (GPoCM)–Net introduced a block for latent source analysis called LaSAFT. Employing LaSAFT blocks, it established state-of-the-art performance on several tasks of the MUSDB18 benchmark. This paper enhances the LaSAFT block by exploiting a self-conditioning method. Whereas the existing method only cares about the symbolic re- lationships between the target source symbol and latent sources, ignoring audio content, the new approach also considers audio content. The enhanced block computes the attention mask conditioning on the label and the input audio feature map. Here, it is shown that the conditioned U-Net employing the enhanced LaSAFT blocks outperforms the previous model. It is also shown that the present model performs the audio-query–based separation with a slight modification.
{"title":"Conditioned Source Separation by Attentively Aggregating Frequency Transformations With Self-Conditioning","authors":"Woosung Choi, Yeong-Seok Jeong, Jinsung Kim, Jaehwa Chung, Soonyoung Jung, J. Reiss","doi":"10.17743/jaes.2022.0030","DOIUrl":"https://doi.org/10.17743/jaes.2022.0030","url":null,"abstract":"Label-conditioned source separation extracts the target source, specified by an input symbol, from an input mixture track. A recently proposed label-conditioned source separation model called Latent Source Attentive Frequency Transformation (LaSAFT)–Gated Point-Wise Con- volutional Modulation (GPoCM)–Net introduced a block for latent source analysis called LaSAFT. Employing LaSAFT blocks, it established state-of-the-art performance on several tasks of the MUSDB18 benchmark. This paper enhances the LaSAFT block by exploiting a self-conditioning method. Whereas the existing method only cares about the symbolic re- lationships between the target source symbol and latent sources, ignoring audio content, the new approach also considers audio content. The enhanced block computes the attention mask conditioning on the label and the input audio feature map. Here, it is shown that the conditioned U-Net employing the enhanced LaSAFT blocks outperforms the previous model. It is also shown that the present model performs the audio-query–based separation with a slight modification.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47087314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Cheshire, Jake Drysdale, Sean Enderby, Maciej Tomczak, Jason Hockman
The ability to perceptually modify drum recording parameters in a post-recording process would be of great benefit to engineers limited by time or equipment. In this work, a data-driven approach to post-recording modification of the dampening and microphone positioning parameters commonly associated with snare drum capture is proposed. The system consists of a deep encoder that analyzes audio input and predicts optimal parameters of one or more third-party audio effects, which are then used to process the audio and produce the desired transformed output audio. Furthermore, two novel audio effects are specifically developed to take advantage of the multiple parameter learning abilities of the system. Perceptual quality of transformations is assessed through a subjective listening test, and an object evaluation is used to measure system performance. Results demonstrate a capacity to emulate snare dampening; however, attempts were not successful for emulating microphone position changes.
{"title":"Deep Audio Effects for Snare Drum Recording Transformations","authors":"M. Cheshire, Jake Drysdale, Sean Enderby, Maciej Tomczak, Jason Hockman","doi":"10.17743/jaes.2022.0041","DOIUrl":"https://doi.org/10.17743/jaes.2022.0041","url":null,"abstract":"The ability to perceptually modify drum recording parameters in a post-recording process would be of great benefit to engineers limited by time or equipment. In this work, a data-driven approach to post-recording modification of the dampening and microphone positioning parameters commonly associated with snare drum capture is proposed. The system consists of a deep encoder that analyzes audio input and predicts optimal parameters of one or more third-party audio effects, which are then used to process the audio and produce the desired transformed output audio. Furthermore, two novel audio effects are specifically developed to take advantage of the multiple parameter learning abilities of the system. Perceptual quality of transformations is assessed through a subjective listening test, and an object evaluation is used to measure system performance. Results demonstrate a capacity to emulate snare dampening; however, attempts were not successful for emulating microphone position changes.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41852712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarvesh Agrawal, S. Bech, K. De Moor, Søren Forchhammer
Understanding the influence of technical system parameters on audiovisual experiences is important for technologists to optimize experiences. The focus in this study was on the influence of changes in audio spatialization (varying the loudspeaker configuration for audio rendering from 2.1 to 5.1 to 7.1.4) on the experience of immersion. First, a magnitude estimation experiment was performed to perceptually evaluate envelopment for verifying the initial condition that there is a perceptual difference between the audio spatialization levels. It was found that envelopment increased from 2.1 to 5.1 reproduction, but there was no significant benefit of extending from 5.1 to 7.1.4. An absolute-rating experimental paradigm was used to assess immersion in four audiovisual experiences by 24 participants. Evident differences between immersion scores could not be established, signaling that a change in audio spatialization and subsequent change in envelopment does not guarantee a psychologically immersive experience.
{"title":"Influence of Changes in Audio Spatialization on Immersion in Audiovisual Experiences","authors":"Sarvesh Agrawal, S. Bech, K. De Moor, Søren Forchhammer","doi":"10.17743/jaes.2022.0034","DOIUrl":"https://doi.org/10.17743/jaes.2022.0034","url":null,"abstract":"Understanding the influence of technical system parameters on audiovisual experiences is important for technologists to optimize experiences. The focus in this study was on the influence of changes in audio spatialization (varying the loudspeaker configuration for audio rendering from 2.1 to 5.1 to 7.1.4) on the experience of immersion. First, a magnitude estimation experiment was performed to perceptually evaluate envelopment for verifying the initial condition that there is a perceptual difference between the audio spatialization levels. It was found that envelopment increased from 2.1 to 5.1 reproduction, but there was no significant benefit of extending from 5.1 to 7.1.4. An absolute-rating experimental paradigm was used to assess immersion in four audiovisual experiences by 24 participants. Evident differences between immersion scores could not be established, signaling that a change in audio spatialization and subsequent change in envelopment does not guarantee a psychologically immersive experience.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48276813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Buckling Dielectric Elastomer Transducers as Loudspeakers","authors":"Michael Gareis, J. Maas","doi":"10.17743/jaes.2022.0032","DOIUrl":"https://doi.org/10.17743/jaes.2022.0032","url":null,"abstract":"","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41366026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}