首页 > 最新文献

Eurasip Journal on Audio Speech and Music Processing最新文献

英文 中文
Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement 用于端到端单信道语音增强的子卷积 U-Net 与变压器注意网络
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-02-03 DOI: 10.1186/s13636-024-00331-z
Sivaramakrishna Yecchuri, Sunny Dayal Vanambathina
Recent advancements in deep learning-based speech enhancement models have extensively used attention mechanisms to achieve state-of-the-art methods by demonstrating their effectiveness. This paper proposes a transformer attention network based sub-convolutional U-Net (TANSCUNet) for speech enhancement. Instead of adopting conventional RNNs and temporal convolutional networks for sequence modeling, we employ a novel transformer-based attention network between the sub-convolutional U-Net encoder and decoder for better feature learning. More specifically, it is composed of several adaptive time―frequency attention modules and an adaptive hierarchical attention module, aiming to capture long-term time-frequency dependencies and further aggregate hierarchical contextual information. Additionally, a sub-convolutional encoder-decoder model used different kernel sizes to extract multi-scale local and contextual features from the noisy speech. The experimental results show that the proposed model outperforms several state-of-the-art methods.
基于深度学习的语音增强模型的最新进展广泛使用了注意力机制,通过证明其有效性来实现最先进的方法。本文提出了一种用于语音增强的基于变压器注意网络的子卷积 U-Net(TANSCUNet)。我们没有采用传统的 RNN 和时序卷积网络进行序列建模,而是在次卷积 U-Net 编码器和解码器之间采用了一种新颖的基于变压器的注意力网络,以实现更好的特征学习。更具体地说,它由多个自适应时频注意模块和一个自适应分层注意模块组成,旨在捕捉长期时频依赖性并进一步聚合分层上下文信息。此外,子卷积编码器-解码器模型使用不同的核大小,从噪声语音中提取多尺度局部和上下文特征。实验结果表明,所提出的模型优于几种最先进的方法。
{"title":"Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement","authors":"Sivaramakrishna Yecchuri, Sunny Dayal Vanambathina","doi":"10.1186/s13636-024-00331-z","DOIUrl":"https://doi.org/10.1186/s13636-024-00331-z","url":null,"abstract":"Recent advancements in deep learning-based speech enhancement models have extensively used attention mechanisms to achieve state-of-the-art methods by demonstrating their effectiveness. This paper proposes a transformer attention network based sub-convolutional U-Net (TANSCUNet) for speech enhancement. Instead of adopting conventional RNNs and temporal convolutional networks for sequence modeling, we employ a novel transformer-based attention network between the sub-convolutional U-Net encoder and decoder for better feature learning. More specifically, it is composed of several adaptive time―frequency attention modules and an adaptive hierarchical attention module, aiming to capture long-term time-frequency dependencies and further aggregate hierarchical contextual information. Additionally, a sub-convolutional encoder-decoder model used different kernel sizes to extract multi-scale local and contextual features from the noisy speech. The experimental results show that the proposed model outperforms several state-of-the-art methods.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"21 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139662818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Acoustical feature analysis and optimization for aesthetic recognition of Chinese traditional music 用于中国传统音乐审美识别的声学特征分析与优化
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-02-02 DOI: 10.1186/s13636-023-00326-2
Lingyun Xie, Yuehong Wang, Yan Gao
Chinese traditional music, a vital expression of Chinese cultural heritage, possesses both a profound emotional resonance and artistic allure. This study sets forth to refine and analyze the acoustical features essential for the aesthetic recognition of Chinese traditional music, utilizing a dataset spanning five aesthetic genres. Through recursive feature elimination, we distilled an initial set of 447 low-level physical features to a more manageable 44, establishing their feature-importance coefficients. This reduction allowed us to estimate the quantified influence of higher-level musical components on aesthetic recognition, following the establishment of a correlation between these components and their physical counterparts. We conducted a comprehensive examination of the impact of various musical elements on aesthetic genres. Our findings indicate that the selected 44-dimensional feature set could enhance aesthetic recognition. Among the high-level musical factors, timbre emerges as the most influential, followed by rhythm, pitch, and tonality. Timbre proved pivotal in distinguishing between the JiYang and BeiShang genres, while rhythm and tonality were key in differentiating LingDong from JiYang, as well as LingDong from BeiShang.
中国传统音乐是中国文化遗产的重要表现形式,具有深刻的情感共鸣和艺术魅力。本研究利用横跨五种审美流派的数据集,对中国传统音乐审美识别所必需的声学特征进行了提炼和分析。通过递归特征消除,我们将最初的 447 个低级物理特征提炼为更易于管理的 44 个,并确定了它们的重要特征系数。在建立了较高层次的音乐要素与物理要素之间的相关性之后,我们就能估算出这些要素对审美识别的量化影响。我们全面考察了各种音乐要素对审美流派的影响。我们的研究结果表明,所选的 44 维特征集可以提高审美识别能力。在高层次的音乐因素中,音色的影响最大,其次是节奏、音高和音调。音色被证明是区分济阳和北商音乐流派的关键,而节奏和音调则是区分岭东和济阳以及岭东和北商音乐流派的关键。
{"title":"Acoustical feature analysis and optimization for aesthetic recognition of Chinese traditional music","authors":"Lingyun Xie, Yuehong Wang, Yan Gao","doi":"10.1186/s13636-023-00326-2","DOIUrl":"https://doi.org/10.1186/s13636-023-00326-2","url":null,"abstract":"Chinese traditional music, a vital expression of Chinese cultural heritage, possesses both a profound emotional resonance and artistic allure. This study sets forth to refine and analyze the acoustical features essential for the aesthetic recognition of Chinese traditional music, utilizing a dataset spanning five aesthetic genres. Through recursive feature elimination, we distilled an initial set of 447 low-level physical features to a more manageable 44, establishing their feature-importance coefficients. This reduction allowed us to estimate the quantified influence of higher-level musical components on aesthetic recognition, following the establishment of a correlation between these components and their physical counterparts. We conducted a comprehensive examination of the impact of various musical elements on aesthetic genres. Our findings indicate that the selected 44-dimensional feature set could enhance aesthetic recognition. Among the high-level musical factors, timbre emerges as the most influential, followed by rhythm, pitch, and tonality. Timbre proved pivotal in distinguishing between the JiYang and BeiShang genres, while rhythm and tonality were key in differentiating LingDong from JiYang, as well as LingDong from BeiShang.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"76 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139662758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gated recurrent unit predictor model-based adaptive differential pulse code modulation speech decoder 基于门控递归单元预测器模型的自适应差分脉冲编码调制语音解码器
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-01-20 DOI: 10.1186/s13636-023-00325-3
Gebremichael Kibret Sheferaw, Waweru Mwangi, Michael Kimwele, Adane Mamuye
Speech coding is a method to reduce the amount of data needs to represent speech signals by exploiting the statistical properties of the speech signal. Recently, in the speech coding process, a neural network prediction model has gained attention as the reconstruction process of a nonlinear and nonstationary speech signal. This study proposes a novel approach to improve speech coding performance by using a gated recurrent unit (GRU)-based adaptive differential pulse code modulation (ADPCM) system. This GRU predictor model is trained using a data set of speech samples from the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus actual sample and the ADPCM fixed-predictor output speech sample. Our contribution lies in the development of an algorithm for training the GRU predictive model that can improve its performance in speech coding prediction and a new offline trained predictive model for speech decoder. The results indicate that the proposed system significantly improves the accuracy of speech prediction, demonstrating its potential for speech prediction applications. Overall, this work presents a unique application of the GRU predictive model with ADPCM decoding in speech signal compression, providing a promising approach for future research in this field.
语音编码是一种通过利用语音信号的统计特性来减少表示语音信号所需数据量的方法。最近,在语音编码过程中,神经网络预测模型作为非线性和非稳态语音信号的重构过程受到关注。本研究提出了一种新方法,通过使用基于门控递归单元(GRU)的自适应差分脉冲编码调制(ADPCM)系统来提高语音编码性能。该 GRU 预测器模型是利用来自 DARPA TIMIT 声韵连续语音语料库实际样本和 ADPCM 固定预测器输出语音样本的语音样本数据集进行训练的。我们的贡献在于开发了一种用于训练 GRU 预测模型的算法,该算法可以提高 GRU 预测模型在语音编码预测中的性能,同时还为语音解码器开发了一种新的离线训练预测模型。结果表明,所提出的系统显著提高了语音预测的准确性,证明了其在语音预测应用方面的潜力。总之,这项研究提出了 GRU 预测模型与 ADPCM 解码在语音信号压缩中的独特应用,为该领域的未来研究提供了一种前景广阔的方法。
{"title":"Gated recurrent unit predictor model-based adaptive differential pulse code modulation speech decoder","authors":"Gebremichael Kibret Sheferaw, Waweru Mwangi, Michael Kimwele, Adane Mamuye","doi":"10.1186/s13636-023-00325-3","DOIUrl":"https://doi.org/10.1186/s13636-023-00325-3","url":null,"abstract":"Speech coding is a method to reduce the amount of data needs to represent speech signals by exploiting the statistical properties of the speech signal. Recently, in the speech coding process, a neural network prediction model has gained attention as the reconstruction process of a nonlinear and nonstationary speech signal. This study proposes a novel approach to improve speech coding performance by using a gated recurrent unit (GRU)-based adaptive differential pulse code modulation (ADPCM) system. This GRU predictor model is trained using a data set of speech samples from the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus actual sample and the ADPCM fixed-predictor output speech sample. Our contribution lies in the development of an algorithm for training the GRU predictive model that can improve its performance in speech coding prediction and a new offline trained predictive model for speech decoder. The results indicate that the proposed system significantly improves the accuracy of speech prediction, demonstrating its potential for speech prediction applications. Overall, this work presents a unique application of the GRU predictive model with ADPCM decoding in speech signal compression, providing a promising approach for future research in this field.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"85 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139509493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generating chord progression from melody with flexible harmonic rhythm and controllable harmonic density 根据旋律生成和弦进行,和声节奏灵活,和声密度可控
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-01-15 DOI: 10.1186/s13636-023-00314-6
Shangda Wu, Yue Yang, Zhaowen Wang, Xiaobing Li, Maosong Sun
Melody harmonization, which involves generating a chord progression that complements a user-provided melody, continues to pose a significant challenge. A chord progression must not only be in harmony with the melody, but also interdependent on its rhythmic pattern. While previous neural network-based systems have been successful in producing chord progressions for given melodies, they have not adequately addressed controllable melody harmonization, nor have they focused on generating harmonic rhythms with flexibility in the rates or patterns of chord changes. This paper presents AutoHarmonizer, a novel system for harmonic density-controllable melody harmonization with such a flexible harmonic rhythm. AutoHarmonizer is equipped with an extensive vocabulary of 1462 chord types and can generate chord progressions that vary in harmonic density for a given melody. Experimental results indicate that the AutoHarmonizer-generated chord progressions exhibit a diverse range of harmonic rhythms and that the system’s controllable harmonic density is effective.
旋律和声是指生成一个和弦进行,以补充用户提供的旋律,这仍然是一个重大挑战。和弦进行不仅必须与旋律和谐,还必须与旋律的节奏型相互依存。虽然以前基于神经网络的系统能成功地为给定旋律生成和弦进行,但它们并没有充分解决可控旋律和声的问题,也没有专注于生成和弦变化率或模式灵活的和声节奏。本文介绍的 AutoHarmonizer 是一种新颖的系统,用于和声密度可控的旋律和声,并具有这种灵活的和声节奏。AutoHarmonizer 配备了由 1462 种和弦类型组成的丰富词汇,可以为给定的旋律生成和声密度不同的和弦行进。实验结果表明,AutoHarmonizer 生成的和弦行进表现出多种多样的和声节奏,而且系统的可控和声密度非常有效。
{"title":"Generating chord progression from melody with flexible harmonic rhythm and controllable harmonic density","authors":"Shangda Wu, Yue Yang, Zhaowen Wang, Xiaobing Li, Maosong Sun","doi":"10.1186/s13636-023-00314-6","DOIUrl":"https://doi.org/10.1186/s13636-023-00314-6","url":null,"abstract":"Melody harmonization, which involves generating a chord progression that complements a user-provided melody, continues to pose a significant challenge. A chord progression must not only be in harmony with the melody, but also interdependent on its rhythmic pattern. While previous neural network-based systems have been successful in producing chord progressions for given melodies, they have not adequately addressed controllable melody harmonization, nor have they focused on generating harmonic rhythms with flexibility in the rates or patterns of chord changes. This paper presents AutoHarmonizer, a novel system for harmonic density-controllable melody harmonization with such a flexible harmonic rhythm. AutoHarmonizer is equipped with an extensive vocabulary of 1462 chord types and can generate chord progressions that vary in harmonic density for a given melody. Experimental results indicate that the AutoHarmonizer-generated chord progressions exhibit a diverse range of harmonic rhythms and that the system’s controllable harmonic density is effective.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"9 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139470654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios 更正:使用扬声器嵌入的特设麦克风聚类的鲁棒性:在现实和挑战性场景下的评估
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-01-15 DOI: 10.1186/s13636-023-00319-1
Stijn Kindt, Jenthe Thienpondt, Luca Becker, Nilesh Madhu
<p><b>Correction: EURASIP Journal on Audio, Speech, and Music Processing 2023, 46 (2023)</b></p><p><b>https://doi.org/10.1186/s13636-023-00310-w</b></p><p>Following publication of the original article [1], we have been notified that Figure 14, for each cluster subfigure, there was an additional bottom row. These have been removed.</p><p>Originally published Figure 14:</p><figure><picture><source srcset="//media.springernature.com/lw685/springer-static/image/art%3A10.1186%2Fs13636-023-00319-1/MediaObjects/13636_2023_319_Figa_HTML.png?as=webp" type="image/webp"/><img alt="figure a" aria-describedby="Figa" height="949" loading="lazy" src="//media.springernature.com/lw685/springer-static/image/art%3A10.1186%2Fs13636-023-00319-1/MediaObjects/13636_2023_319_Figa_HTML.png" width="427"/></picture></figure><p>Corrected Figure 14:</p><figure><picture><source srcset="//media.springernature.com/lw685/springer-static/image/art%3A10.1186%2Fs13636-023-00319-1/MediaObjects/13636_2023_319_Figb_HTML.png?as=webp" type="image/webp"/><img alt="figure b" aria-describedby="Figb" height="844" loading="lazy" src="//media.springernature.com/lw685/springer-static/image/art%3A10.1186%2Fs13636-023-00319-1/MediaObjects/13636_2023_319_Figb_HTML.png" width="685"/></picture></figure><p>The original article has been corrected.</p><ol data-track-component="outbound reference"><li data-counter="1."><p>Kindt et al., Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios. EURASIP J. Audio Speech Music Process. <b>2023</b>, 46 (2023). https://doi.org/10.1186/s13636-023-00310-w</p><p>Article Google Scholar </p></li></ol><p>Download references<svg aria-hidden="true" focusable="false" height="16" role="img" width="16"><use xlink:href="#icon-eds-i-download-medium" xmlns:xlink="http://www.w3.org/1999/xlink"></use></svg></p><h3>Authors and Affiliations</h3><ol><li><p>IDLab, Department of Electronics and Information Systems, Ghent University - Imec, Ghent, Belgium</p><p>Stijn Kindt, Jenthe Thienpondt & Nilesh Madhu</p></li><li><p>Institute of Communication Acoustics, Ruhr-Universität Bochum, Bochum, Germany</p><p>Luca Becker</p></li></ol><span>Authors</span><ol><li><span>Stijn Kindt</span>View author publications<p>You can also search for this author in <span>PubMed<span> </span>Google Scholar</span></p></li><li><span>Jenthe Thienpondt</span>View author publications<p>You can also search for this author in <span>PubMed<span> </span>Google Scholar</span></p></li><li><span>Luca Becker</span>View author publications<p>You can also search for this author in <span>PubMed<span> </span>Google Scholar</span></p></li><li><span>Nilesh Madhu</span>View author publications<p>You can also search for this author in <span>PubMed<span> </span>Google Scholar</span></p></li></ol><h3>Corresponding author</h3><p>Correspondence to Stijn Kindt.</p><p><b>Open Access</b> This article is licensed under a Creative Commons Attribution 4.0 Internati
更正:EURASIP Journal on Audio, Speech, and Music Processing 2023, 46 (2023)https://doi.org/10.1186/s13636-023-00310-wFollowing 原文[1]发表后,我们被告知图 14 中每个聚类子图的底部多了一行。图 14:原文已更正。Kindt 等人,Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios.EURASIP J. Audio Speech Music Process.2023, 46 (2023). https://doi.org/10.1186/s13636-023-00310-wArticle Google Scholar Download referencesAuthors and AffiliationsIDLab, Department of Electronics and Information Systems, Ghent University - Imec, Ghent, BelgiumStijn Kindt, Jenthe Thienpondt &;Nilesh MadhuInstitute of Communication Acoustics, Ruhr-Universität Bochum, Bochum、德国Luca Becker作者Stijn Kindt查看作者发表的文章您也可以在PubMed Google Scholar中搜索该作者Jenthe Thienpondt查看作者发表的文章您也可以在PubMed Google Scholar中搜索该作者Luca Becker查看作者发表的文章您也可以在PubMed Google Scholar中搜索该作者Nilesh Madhu查看作者发表的文章您也可以在PubMed Google Scholar中搜索该作者通信作者:Stijn Kindt。开放存取 本文采用知识共享署名 4.0 国际许可协议进行许可,该协议允许以任何媒介或格式使用、共享、改编、分发和复制本文,但需注明原作者和出处,提供知识共享许可协议链接,并说明是否进行了修改。本文中的图片或其他第三方材料均包含在文章的知识共享许可协议中,除非在材料的署名栏中另有说明。如果材料未包含在文章的知识共享许可协议中,且您打算使用的材料不符合法律规定或超出许可使用范围,则您需要直接从版权所有者处获得许可。要查看该许可的副本,请访问 http://creativecommons.org/licenses/by/4.0/.Reprints and permissionsCite this articleKindt, S., Thienpondt, J., Becker, L. et al. Correction:使用扬声器嵌入的特设麦克风聚类的鲁棒性:在现实和挑战场景下的评估。J audio speech music proc.2024, 5 (2024). https://doi.org/10.1186/s13636-023-00319-1Download citationPublished: 15 January 2024DOI: https://doi.org/10.1186/s13636-023-00319-1Share this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative
{"title":"Correction: Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios","authors":"Stijn Kindt, Jenthe Thienpondt, Luca Becker, Nilesh Madhu","doi":"10.1186/s13636-023-00319-1","DOIUrl":"https://doi.org/10.1186/s13636-023-00319-1","url":null,"abstract":"&lt;p&gt;&lt;b&gt;Correction: EURASIP Journal on Audio, Speech, and Music Processing 2023, 46 (2023)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;https://doi.org/10.1186/s13636-023-00310-w&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Following publication of the original article [1], we have been notified that Figure 14, for each cluster subfigure, there was an additional bottom row. These have been removed.&lt;/p&gt;&lt;p&gt;Originally published Figure 14:&lt;/p&gt;&lt;figure&gt;&lt;picture&gt;&lt;source srcset=\"//media.springernature.com/lw685/springer-static/image/art%3A10.1186%2Fs13636-023-00319-1/MediaObjects/13636_2023_319_Figa_HTML.png?as=webp\" type=\"image/webp\"/&gt;&lt;img alt=\"figure a\" aria-describedby=\"Figa\" height=\"949\" loading=\"lazy\" src=\"//media.springernature.com/lw685/springer-static/image/art%3A10.1186%2Fs13636-023-00319-1/MediaObjects/13636_2023_319_Figa_HTML.png\" width=\"427\"/&gt;&lt;/picture&gt;&lt;/figure&gt;&lt;p&gt;Corrected Figure 14:&lt;/p&gt;&lt;figure&gt;&lt;picture&gt;&lt;source srcset=\"//media.springernature.com/lw685/springer-static/image/art%3A10.1186%2Fs13636-023-00319-1/MediaObjects/13636_2023_319_Figb_HTML.png?as=webp\" type=\"image/webp\"/&gt;&lt;img alt=\"figure b\" aria-describedby=\"Figb\" height=\"844\" loading=\"lazy\" src=\"//media.springernature.com/lw685/springer-static/image/art%3A10.1186%2Fs13636-023-00319-1/MediaObjects/13636_2023_319_Figb_HTML.png\" width=\"685\"/&gt;&lt;/picture&gt;&lt;/figure&gt;&lt;p&gt;The original article has been corrected.&lt;/p&gt;&lt;ol data-track-component=\"outbound reference\"&gt;&lt;li data-counter=\"1.\"&gt;&lt;p&gt;Kindt et al., Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios. EURASIP J. Audio Speech Music Process. &lt;b&gt;2023&lt;/b&gt;, 46 (2023). https://doi.org/10.1186/s13636-023-00310-w&lt;/p&gt;&lt;p&gt;Article Google Scholar &lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;Download references&lt;svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" role=\"img\" width=\"16\"&gt;&lt;use xlink:href=\"#icon-eds-i-download-medium\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"&gt;&lt;/use&gt;&lt;/svg&gt;&lt;/p&gt;&lt;h3&gt;Authors and Affiliations&lt;/h3&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;IDLab, Department of Electronics and Information Systems, Ghent University - Imec, Ghent, Belgium&lt;/p&gt;&lt;p&gt;Stijn Kindt, Jenthe Thienpondt &amp; Nilesh Madhu&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Institute of Communication Acoustics, Ruhr-Universität Bochum, Bochum, Germany&lt;/p&gt;&lt;p&gt;Luca Becker&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;span&gt;Authors&lt;/span&gt;&lt;ol&gt;&lt;li&gt;&lt;span&gt;Stijn Kindt&lt;/span&gt;View author publications&lt;p&gt;You can also search for this author in &lt;span&gt;PubMed&lt;span&gt; &lt;/span&gt;Google Scholar&lt;/span&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Jenthe Thienpondt&lt;/span&gt;View author publications&lt;p&gt;You can also search for this author in &lt;span&gt;PubMed&lt;span&gt; &lt;/span&gt;Google Scholar&lt;/span&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Luca Becker&lt;/span&gt;View author publications&lt;p&gt;You can also search for this author in &lt;span&gt;PubMed&lt;span&gt; &lt;/span&gt;Google Scholar&lt;/span&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Nilesh Madhu&lt;/span&gt;View author publications&lt;p&gt;You can also search for this author in &lt;span&gt;PubMed&lt;span&gt; &lt;/span&gt;Google Scholar&lt;/span&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;h3&gt;Corresponding author&lt;/h3&gt;&lt;p&gt;Correspondence to Stijn Kindt.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Open Access&lt;/b&gt; This article is licensed under a Creative Commons Attribution 4.0 Internati","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"22 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139470389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural electric bass guitar synthesis framework enabling attack-sustain-representation-based technique control 神经电贝司吉他合成框架,实现基于攻击-持续-再现的技术控制
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-01-11 DOI: 10.1186/s13636-024-00327-9
Junya Koguchi, Masanori Morise
Musical instrument sound synthesis (MISS) often utilizes a text-to-speech framework because of its similarity to speech in terms of generating sounds from symbols. Moreover, a plucked string instrument, such as electric bass guitar (EBG), shares acoustical similarities with speech. We propose an attack-sustain (AS) representation of the playing technique to take advantage of this similarity. The AS representation treats the attack segment as an unvoiced consonant and the sustain segment as a voiced vowel. In addition, we propose a MISS framework for an EBG that can control its playing techniques: (1) we constructed a EBG sound database containing a rich set of playing techniques, (2) we developed a dynamic time warping and timbre conversion to align the sounds and AS labels, (3) we extend an existing MISS framework to control playing techniques using AS representation as control symbols. The experimental evaluation suggests that our AS representation effectively controls the playing techniques and improves the naturalness of the synthetic sound.
乐器声音合成(MISS)通常使用文本到语音框架,因为它在从符号生成声音方面与语音相似。此外,电贝司吉他(EBG)等弹拨弦乐器与语音在声学上也有相似之处。为了利用这种相似性,我们提出了弹奏技巧的攻击-持续(AS)表示法。AS 表示法将攻击音段视为无声辅音,将延音音段视为有声元音。此外,我们还提出了一个可控制 EBG 演奏技巧的 MISS 框架:(1) 我们构建了一个包含丰富演奏技巧的 EBG 声音数据库;(2) 我们开发了一种动态时间扭曲和音色转换技术,以调整声音和 AS 标签;(3) 我们扩展了现有的 MISS 框架,以使用 AS 表示作为控制符号来控制演奏技巧。实验评估表明,我们的 AS 表示法能有效控制演奏技巧,并提高合成声音的自然度。
{"title":"Neural electric bass guitar synthesis framework enabling attack-sustain-representation-based technique control","authors":"Junya Koguchi, Masanori Morise","doi":"10.1186/s13636-024-00327-9","DOIUrl":"https://doi.org/10.1186/s13636-024-00327-9","url":null,"abstract":"Musical instrument sound synthesis (MISS) often utilizes a text-to-speech framework because of its similarity to speech in terms of generating sounds from symbols. Moreover, a plucked string instrument, such as electric bass guitar (EBG), shares acoustical similarities with speech. We propose an attack-sustain (AS) representation of the playing technique to take advantage of this similarity. The AS representation treats the attack segment as an unvoiced consonant and the sustain segment as a voiced vowel. In addition, we propose a MISS framework for an EBG that can control its playing techniques: (1) we constructed a EBG sound database containing a rich set of playing techniques, (2) we developed a dynamic time warping and timbre conversion to align the sounds and AS labels, (3) we extend an existing MISS framework to control playing techniques using AS representation as control symbols. The experimental evaluation suggests that our AS representation effectively controls the playing techniques and improves the naturalness of the synthetic sound.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"25 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139421015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Significance of relative phase features for shouted and normal speech classification 相对相位特征对喊叫和正常语音分类的意义
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-01-06 DOI: 10.1186/s13636-023-00324-4
Khomdet Phapatanaburi, Longbiao Wang, Meng Liu, Seiichi Nakagawa, Talit Jumphoo, Peerapong Uthansakul
Shouted and normal speech classification plays an important role in many speech-related applications. The existing works are often based on magnitude-based features and ignore phase-based features, which are directly related to magnitude information. In this paper, the importance of phase-based features is explored for the detection of shouted speech. The novel contributions of this work are as follows. (1) Three phase-based features, namely, relative phase (RP), linear prediction analysis estimated speech-based RP (LPAES-RP) and linear prediction residual-based RP (LPR-RP) features, are explored for shouted and normal speech classification. (2) We propose a new RP feature, called the glottal source-based RP (GRP) feature. The main idea of the proposed GRP feature is to exploit the difference between RP and LPAES-RP features to detect shouted speech. (3) A score combination of phase- and magnitude-based features is also employed to further improve the classification performance. The proposed feature and combination are evaluated using the shouted normal electroglottograph speech (SNE-Speech) corpus. The experimental findings show that the RP, LPAES-RP, and LPR-RP features provide promising results for the detection of shouted speech. We also find that the proposed GRP feature can provide better results than those of the standard mel-frequency cepstral coefficient (MFCC) feature. Moreover, compared to using individual features, the score combination of the MFCC and RP/LPAES-RP/LPR-RP/GRP features yields an improved detection performance. Performance analysis under noisy environments shows that the score combination of the MFCC and the RP/LPAES-RP/LPR-RP features gives more robust classification. These outcomes show the importance of RP features in distinguishing shouted speech from normal speech.
在许多与语音相关的应用中,喊话和正常语音分类发挥着重要作用。现有研究通常基于幅度特征,而忽略了与幅度信息直接相关的相位特征。本文探讨了基于相位的特征对检测喊话语音的重要性。这项工作的新贡献如下。(1) 探索了三种基于相位的特征,即相对相位(RP)、基于线性预测分析估计语音的 RP(LPAES-RP)和基于线性预测残差的 RP(LPR-RP)特征,用于喊叫语音和正常语音的分类。(2) 我们提出了一种新的 RP 特征,称为基于声门源的 RP(GRP)特征。所提出的 GRP 特征的主要思想是利用 RP 和 LPAES-RP 特征之间的差异来检测喊叫语音。(3) 还采用了基于相位和幅度特征的得分组合,以进一步提高分类性能。利用喊话正常电图语音(SNE-Speech)语料库对所提出的特征和组合进行了评估。实验结果表明,RP、LPAES-RP 和 LPR-RP 特征在检测喊话语音方面效果良好。我们还发现,所提出的 GRP 特征比标准的 mel-frequency cepstral coefficient(MFCC)特征能提供更好的结果。此外,与使用单个特征相比,MFCC 和 RP/LPAES-RP/LPR-RP/GRP 特征的得分组合能提高检测性能。噪声环境下的性能分析表明,MFCC 和 RP/LPAES-RP/LPR-RP 特征的分数组合能提供更稳健的分类。这些结果表明了 RP 特征在区分喊叫语音和正常语音方面的重要性。
{"title":"Significance of relative phase features for shouted and normal speech classification","authors":"Khomdet Phapatanaburi, Longbiao Wang, Meng Liu, Seiichi Nakagawa, Talit Jumphoo, Peerapong Uthansakul","doi":"10.1186/s13636-023-00324-4","DOIUrl":"https://doi.org/10.1186/s13636-023-00324-4","url":null,"abstract":"Shouted and normal speech classification plays an important role in many speech-related applications. The existing works are often based on magnitude-based features and ignore phase-based features, which are directly related to magnitude information. In this paper, the importance of phase-based features is explored for the detection of shouted speech. The novel contributions of this work are as follows. (1) Three phase-based features, namely, relative phase (RP), linear prediction analysis estimated speech-based RP (LPAES-RP) and linear prediction residual-based RP (LPR-RP) features, are explored for shouted and normal speech classification. (2) We propose a new RP feature, called the glottal source-based RP (GRP) feature. The main idea of the proposed GRP feature is to exploit the difference between RP and LPAES-RP features to detect shouted speech. (3) A score combination of phase- and magnitude-based features is also employed to further improve the classification performance. The proposed feature and combination are evaluated using the shouted normal electroglottograph speech (SNE-Speech) corpus. The experimental findings show that the RP, LPAES-RP, and LPR-RP features provide promising results for the detection of shouted speech. We also find that the proposed GRP feature can provide better results than those of the standard mel-frequency cepstral coefficient (MFCC) feature. Moreover, compared to using individual features, the score combination of the MFCC and RP/LPAES-RP/LPR-RP/GRP features yields an improved detection performance. Performance analysis under noisy environments shows that the score combination of the MFCC and the RP/LPAES-RP/LPR-RP features gives more robust classification. These outcomes show the importance of RP features in distinguishing shouted speech from normal speech.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"31 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139373734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep semantic learning for acoustic scene classification 声学场景分类的深度语义学习
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-01-03 DOI: 10.1186/s13636-023-00323-5
Yun-Fei Shao, Xin-Xin Ma, Yong Ma, Wei-Qiang Zhang
Acoustic scene classification (ASC) is the process of identifying the acoustic environment or scene from which an audio signal is recorded. In this work, we propose an encoder-decoder-based approach to ASC, which is borrowed from the SegNet in image semantic segmentation tasks. We also propose a novel feature normalization method named Mixup Normalization, which combines channel-wise instance normalization and the Mixup method to learn useful information for scene and discard specific information related to different devices. In addition, we propose an event extraction block, which can extract the accurate semantic segmentation region from the segmentation network, to imitate the effect of image segmentation on audio features. With four data augmentation techniques, our best single system achieved an average accuracy of 71.26% on different devices in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 ASC Task 1A dataset. The result indicates a minimum margin of 17% against the DCASE 2020 challenge Task 1A baseline system. It has lower complexity and higher performance compared with other state-of-the-art CNN models, without using any supplementary data other than the official challenge dataset.
声学场景分类(ASC)是指识别记录音频信号的声学环境或场景的过程。在这项工作中,我们提出了一种基于编码器-解码器的 ASC 方法,该方法借鉴了图像语义分割任务中的 SegNet。我们还提出了一种名为 "混合归一化"(Mixup Normalization)的新型特征归一化方法,该方法结合了信道实例归一化和混合归一化方法,以学习场景的有用信息,并摒弃与不同设备相关的特定信息。此外,我们还提出了一个事件提取模块,可以从分割网络中提取准确的语义分割区域,以模仿图像分割对音频特征的影响。通过四种数据增强技术,我们的最佳单一系统在声学场景和事件检测与分类(DCASE)2020 ASC 任务 1A 数据集上的不同设备上取得了 71.26% 的平均准确率。该结果表明,与 DCASE 2020 挑战任务 1A 基准系统相比,最小差值为 17%。与其他最先进的 CNN 模型相比,该系统具有更低的复杂度和更高的性能,而且除官方挑战数据集外未使用任何补充数据。
{"title":"Deep semantic learning for acoustic scene classification","authors":"Yun-Fei Shao, Xin-Xin Ma, Yong Ma, Wei-Qiang Zhang","doi":"10.1186/s13636-023-00323-5","DOIUrl":"https://doi.org/10.1186/s13636-023-00323-5","url":null,"abstract":"Acoustic scene classification (ASC) is the process of identifying the acoustic environment or scene from which an audio signal is recorded. In this work, we propose an encoder-decoder-based approach to ASC, which is borrowed from the SegNet in image semantic segmentation tasks. We also propose a novel feature normalization method named Mixup Normalization, which combines channel-wise instance normalization and the Mixup method to learn useful information for scene and discard specific information related to different devices. In addition, we propose an event extraction block, which can extract the accurate semantic segmentation region from the segmentation network, to imitate the effect of image segmentation on audio features. With four data augmentation techniques, our best single system achieved an average accuracy of 71.26% on different devices in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 ASC Task 1A dataset. The result indicates a minimum margin of 17% against the DCASE 2020 challenge Task 1A baseline system. It has lower complexity and higher performance compared with other state-of-the-art CNN models, without using any supplementary data other than the official challenge dataset.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"61 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139082371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Steered Response Power for Sound Source Localization: a tutorial review. 声源定位的转向响应功率:教程回顾。
IF 1.7 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-01-01 Epub Date: 2024-11-12 DOI: 10.1186/s13636-024-00377-z
Eric Grinstein, Elisa Tengan, Bilgesu Çakmak, Thomas Dietzen, Leonardo Nunes, Toon van Waterschoot, Mike Brookes, Patrick A Naylor

In the last three decades, the Steered Response Power (SRP) method has been widely used for the task of Sound Source Localization (SSL), due to its satisfactory localization performance on moderately reverberant and noisy scenarios. Many works have analysed and extended the original SRP method to reduce its computational cost, to allow it to locate multiple sources, or to improve its performance in adverse environments. In this work, we review over 200 papers on the SRP method and its variants, with emphasis on the SRP-PHAT method. We also present eXtensible-SRP, or X-SRP, a generalized and modularized version of the SRP algorithm which allows the reviewed extensions to be implemented. We provide a Python implementation of the algorithm which includes selected extensions from the literature.

在过去的三十年里,转向响应功率(SRP)方法因其在中等混响和噪声环境下令人满意的定位性能,被广泛用于声源定位(SSL)任务。许多研究对原始 SRP 方法进行了分析和扩展,以降低其计算成本,使其能够定位多个声源,或提高其在不利环境中的性能。在这项工作中,我们回顾了有关 SRP 方法及其变体的 200 多篇论文,重点是 SRP-PHAT 方法。我们还介绍了 eXtensible-SRP(或 X-SRP),它是 SRP 算法的通用模块化版本,可以实现所回顾的扩展。我们提供了该算法的 Python 实现,其中包括文献中选定的扩展。
{"title":"Steered Response Power for Sound Source Localization: a tutorial review.","authors":"Eric Grinstein, Elisa Tengan, Bilgesu Çakmak, Thomas Dietzen, Leonardo Nunes, Toon van Waterschoot, Mike Brookes, Patrick A Naylor","doi":"10.1186/s13636-024-00377-z","DOIUrl":"10.1186/s13636-024-00377-z","url":null,"abstract":"<p><p>In the last three decades, the Steered Response Power (SRP) method has been widely used for the task of Sound Source Localization (SSL), due to its satisfactory localization performance on moderately reverberant and noisy scenarios. Many works have analysed and extended the original SRP method to reduce its computational cost, to allow it to locate multiple sources, or to improve its performance in adverse environments. In this work, we review over 200 papers on the SRP method and its variants, with emphasis on the SRP-PHAT method. We also present eXtensible-SRP, or X-SRP, a generalized and modularized version of the SRP algorithm which allows the reviewed extensions to be implemented. We provide a Python implementation of the algorithm which includes selected extensions from the literature.</p>","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"2024 1","pages":"59"},"PeriodicalIF":1.7,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11557718/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A framework for the acoustic simulation of passing vehicles using variable length delay lines. 利用长度可变的延迟线对过往车辆进行声学模拟的框架。
IF 1.7 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-01-01 Epub Date: 2024-10-03 DOI: 10.1186/s13636-024-00372-4
Stefano Damiano, Luca Bondi, Andre Guntoro, Toon van Waterschoot

The sound produced by vehicles driving on roadways constitutes one of the dominant noise sources in urban areas. The impact of traffic noise on human activities and the related investigation on modeling, assessment, and abatement strategies fueled the research on the simulation of the sound produced by individual passing vehicles. Simulators enable in fact to promote a perceptual assessment of the nature of traffic noise and of the impact of single road agents on the overall soundscape. In this work, we present TrafficSoundSim, an open-source framework for the acoustic simulation of vehicles transiting on a road. We first discuss the generation of the sound signal produced by a vehicle, represented as a combination of road/tire interaction noise and engine noise. We then introduce a propagation model based on the use of variable length delay lines, allowing to simulate acoustic propagation and Doppler effect. The proposed simulator incorporates the effect of air absorption and ground reflection, modeled via complex-valued reflection coefficients dependent on the road surface impedance, as well as a model of the directivity of sound sources representing the passing vehicles. The source signal generation and the propagation stages are decoupled, and all effects are implemented using finite impulse response filters. Moreover, no recorded data is required to run the simulation, making the framework flexible and independent on data availability. Finally, to validate the framework capability to accurately simulate passing vehicles, a comparison between synthetic and recorded pass-by events is presented. The validation shows that sounds generated with the proposed method achieve a good match with recorded events in terms of power spectral density and psychoacoustics metrics as well as a perceptually plausible result.

车辆在道路上行驶时产生的声音是城市地区的主要噪声源之一。交通噪声对人类活动的影响以及对建模、评估和减噪策略的相关研究,推动了对单个过往车辆产生的声音进行模拟的研究。事实上,模拟器能够促进对交通噪声性质的感知评估,以及单个道路主体对整体声景的影响。在这项工作中,我们介绍了 TrafficSoundSim,这是一个用于道路车辆声学模拟的开源框架。我们首先讨论了车辆产生的声音信号的产生过程,该信号由道路/轮胎相互作用噪声和发动机噪声组合而成。然后,我们介绍了一种基于可变长度延迟线的传播模型,可以模拟声波传播和多普勒效应。所提议的模拟器包含空气吸收和地面反射的影响(通过与路面阻抗相关的复值反射系数进行建模),以及代表过往车辆的声源指向性模型。声源信号的产生和传播阶段是分离的,所有影响都是通过有限脉冲响应滤波器实现的。此外,运行模拟不需要记录数据,这使得该框架非常灵活,不受数据可用性的影响。最后,为了验证该框架准确模拟过往车辆的能力,我们对合成和记录的过往事件进行了比较。验证结果表明,在功率谱密度和心理声学指标方面,使用所提出的方法生成的声音与记录的事件非常吻合,而且结果在感知上也是可信的。
{"title":"A framework for the acoustic simulation of passing vehicles using variable length delay lines.","authors":"Stefano Damiano, Luca Bondi, Andre Guntoro, Toon van Waterschoot","doi":"10.1186/s13636-024-00372-4","DOIUrl":"10.1186/s13636-024-00372-4","url":null,"abstract":"<p><p>The sound produced by vehicles driving on roadways constitutes one of the dominant noise sources in urban areas. The impact of traffic noise on human activities and the related investigation on modeling, assessment, and abatement strategies fueled the research on the simulation of the sound produced by individual passing vehicles. Simulators enable in fact to promote a perceptual assessment of the nature of traffic noise and of the impact of single road agents on the overall soundscape. In this work, we present <i>TrafficSoundSim</i>, an open-source framework for the acoustic simulation of vehicles transiting on a road. We first discuss the generation of the sound signal produced by a vehicle, represented as a combination of road/tire interaction noise and engine noise. We then introduce a propagation model based on the use of variable length delay lines, allowing to simulate acoustic propagation and Doppler effect. The proposed simulator incorporates the effect of air absorption and ground reflection, modeled via complex-valued reflection coefficients dependent on the road surface impedance, as well as a model of the directivity of sound sources representing the passing vehicles. The source signal generation and the propagation stages are decoupled, and all effects are implemented using finite impulse response filters. Moreover, no recorded data is required to run the simulation, making the framework flexible and independent on data availability. Finally, to validate the framework capability to accurately simulate passing vehicles, a comparison between synthetic and recorded pass-by events is presented. The validation shows that sounds generated with the proposed method achieve a good match with recorded events in terms of power spectral density and psychoacoustics metrics as well as a perceptually plausible result.</p>","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"2024 1","pages":"49"},"PeriodicalIF":1.7,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11446978/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142382151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Eurasip Journal on Audio Speech and Music Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1