Acoustic scene classification (ASC) and sound event detection (SED) are major topics in environmental sound analysis. Considering that acoustic scenes and sound events are closely related to each other, the joint analysis of acoustic scenes and sound events using multitask learning (MTL)-based neural networks was proposed in some previous works. Conventional methods train MTL-based models using a linear combination of ASC and SED loss functions with constant weights. However, the performance of conventional MTL-based methods depends strongly on the weights of the ASC and SED losses, and it is difficult to determine the appropriate balance between the constant weights of the losses of MTL of ASC and SED. In this paper, we thus propose dynamic weight adaptation methods for MTL of ASC and SED based on dynamic weight average (DWA) and multi-focal loss (MFL) to adjust the learning weights automatically. By comparing the two methods, we then clarify how the dynamic adaptation of the loss weights, rather than specific methods of DWA and MFL, generally benefits the joint analysis of ASC and SED based on MTL. Moreover, we investigate how the training of the joint ASC and SED model dynamically progresses and disclose how the loss weights affect their performance.
{"title":"Joint analysis of acoustic scenes and sound events based on multitask learning with dynamic weight adaptation","authors":"Kayo Nada, Keisuke Imoto, Takao Tsuchiya","doi":"10.1250/ast.44.167","DOIUrl":"https://doi.org/10.1250/ast.44.167","url":null,"abstract":"Acoustic scene classification (ASC) and sound event detection (SED) are major topics in environmental sound analysis. Considering that acoustic scenes and sound events are closely related to each other, the joint analysis of acoustic scenes and sound events using multitask learning (MTL)-based neural networks was proposed in some previous works. Conventional methods train MTL-based models using a linear combination of ASC and SED loss functions with constant weights. However, the performance of conventional MTL-based methods depends strongly on the weights of the ASC and SED losses, and it is difficult to determine the appropriate balance between the constant weights of the losses of MTL of ASC and SED. In this paper, we thus propose dynamic weight adaptation methods for MTL of ASC and SED based on dynamic weight average (DWA) and multi-focal loss (MFL) to adjust the learning weights automatically. By comparing the two methods, we then clarify how the dynamic adaptation of the loss weights, rather than specific methods of DWA and MFL, generally benefits the joint analysis of ASC and SED based on MTL. Moreover, we investigate how the training of the joint ASC and SED model dynamically progresses and disclose how the loss weights affect their performance.","PeriodicalId":46068,"journal":{"name":"Acoustical Science and Technology","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136048369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Formant estimation of high-pitched noisy speech using homomorphic deconvolution of higher-order group delay spectrum","authors":"Husne Ara Chowdhury, Mohammad Shahidur Rahman","doi":"10.1250/ast.44.84","DOIUrl":"https://doi.org/10.1250/ast.44.84","url":null,"abstract":"","PeriodicalId":46068,"journal":{"name":"Acoustical Science and Technology","volume":"111 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90367154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heavy-weight floor impact sound of a pure framed structure by field measurement and numerical calculation","authors":"Tomoaki Uemura, N. Hashimoto, Yasuyuki Kondo","doi":"10.1250/ast.44.120","DOIUrl":"https://doi.org/10.1250/ast.44.120","url":null,"abstract":"","PeriodicalId":46068,"journal":{"name":"Acoustical Science and Technology","volume":"63 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84114272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speech source separation avoiding initial value dependency by cepstral-basis-decomposed nonnegative matrix factorization","authors":"Fuga Oshima, M. Nakayama","doi":"10.1250/ast.44.137","DOIUrl":"https://doi.org/10.1250/ast.44.137","url":null,"abstract":"","PeriodicalId":46068,"journal":{"name":"Acoustical Science and Technology","volume":"66 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81166227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic generation of stage data for music games with sparse target density","authors":"Atsuhito Udo, N. Aoki, Y. Dobashi","doi":"10.1250/ast.44.49","DOIUrl":"https://doi.org/10.1250/ast.44.49","url":null,"abstract":"","PeriodicalId":46068,"journal":{"name":"Acoustical Science and Technology","volume":"40 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75697803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bit rate required for mono audio object in object-based audio program compressed with MPEG-H 3D Audio","authors":"T. Sugimoto","doi":"10.1250/ast.44.93","DOIUrl":"https://doi.org/10.1250/ast.44.93","url":null,"abstract":"","PeriodicalId":46068,"journal":{"name":"Acoustical Science and Technology","volume":"1 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74799901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effect of varying corridor parameters on signal-to-noise ratio in classrooms","authors":"Hengling Song","doi":"10.1250/ast.44.110","DOIUrl":"https://doi.org/10.1250/ast.44.110","url":null,"abstract":"","PeriodicalId":46068,"journal":{"name":"Acoustical Science and Technology","volume":"9 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78319794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper reports on the implementation of a moving sound source and receiver with directivity in the two-dimensional finite-difference time-domain (FDTD) method. A two-dimensional fundamental solution of a moving monopole source is theoretically derived. Then, a fundamental solution of a moving dipole source is obtained by differentiating the fundamental solution of a monopole source in space. Finally, the directivity of moving monopole, dipole, and cardioid sources is theoretically derived. Numerical experiments performed on the two-dimensional sound field showed that the effect of moving velocity on amplitude differs for the monopole and dipole sources. Furthermore, it was found that directivity characteristics of dipole and cardioid sources vary depending on the beam steering angle and moving direction. The present method can be accurately applied to the moving sound source and receiver with directivity.
{"title":"Two-dimensional finite-difference time-domain simulation of moving sound source and receiver with directivity","authors":"Takao Tsuchiya, Yusuke Makino, Yu Teshima, Shizuko Hiryu","doi":"10.1250/ast.44.101","DOIUrl":"https://doi.org/10.1250/ast.44.101","url":null,"abstract":"This paper reports on the implementation of a moving sound source and receiver with directivity in the two-dimensional finite-difference time-domain (FDTD) method. A two-dimensional fundamental solution of a moving monopole source is theoretically derived. Then, a fundamental solution of a moving dipole source is obtained by differentiating the fundamental solution of a monopole source in space. Finally, the directivity of moving monopole, dipole, and cardioid sources is theoretically derived. Numerical experiments performed on the two-dimensional sound field showed that the effect of moving velocity on amplitude differs for the monopole and dipole sources. Furthermore, it was found that directivity characteristics of dipole and cardioid sources vary depending on the beam steering angle and moving direction. The present method can be accurately applied to the moving sound source and receiver with directivity.","PeriodicalId":46068,"journal":{"name":"Acoustical Science and Technology","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136051928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Abstracts of Papers in the Journal of the Acoustical Society of Japan (J)","authors":"","doi":"10.1250/ast.44.155","DOIUrl":"https://doi.org/10.1250/ast.44.155","url":null,"abstract":"","PeriodicalId":46068,"journal":{"name":"Acoustical Science and Technology","volume":"293 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78510854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}