首页 > 最新文献

Eurasip Journal on Audio Speech and Music Processing最新文献

英文 中文
Optimizing feature fusion for improved zero-shot adaptation in text-to-speech synthesis 优化特征融合,改进文本到语音合成中的零点适应性
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-05-28 DOI: 10.1186/s13636-024-00351-9
Zhiyong Chen, Zhiqi Ai, Youxuan Ma, Xinnuo Li, Shugong Xu
In the era of advanced text-to-speech (TTS) systems capable of generating high-fidelity, human-like speech by referring a reference speech, voice cloning (VC), or zero-shot TTS (ZS-TTS), stands out as an important subtask. A primary challenge in VC is maintaining speech quality and speaker similarity with limited reference data for a specific speaker. However, existing VC systems often rely on naive combinations of embedded speaker vectors for speaker control, which compromises the capture of speaking style, voice print, and semantic accuracy. To overcome this, we introduce the Two-branch Speaker Control Module (TSCM), a novel and highly adaptable voice cloning module designed to precisely processing speaker or style control for a target speaker. Our method uses an advanced fusion of local-level features from a Gated Convolutional Network (GCN) and utterance-level features from a gated recurrent unit (GRU) to enhance speaker control. We demonstrate the effectiveness of TSCM by integrating it into advanced TTS systems like FastSpeech 2 and VITS architectures, significantly optimizing their performance. Experimental results show that TSCM enables accurate voice cloning for a target speaker with minimal data through both zero-shot or few-shot fine-tuning of pretrained TTS models. Furthermore, our TSCM-based VITS (TSCM-VITS) showcases superior performance in zero-shot scenarios compared to existing state-of-the-art VC systems, even with basic dataset configurations. Our method’s superiority is validated through comprehensive subjective and objective evaluations. A demonstration of our system is available at https://great-research.github.io/tsct-tts-demo/ , providing practical insights into its application and effectiveness.
先进的文本到语音(TTS)系统能够通过引用参考语音生成高保真的类人语音,在这个时代,语音克隆(VC)或零镜头 TTS(ZS-TTS)作为一项重要的子任务脱颖而出。VC 面临的一个主要挑战是,在特定说话人的参考数据有限的情况下,如何保持语音质量和说话人的相似性。然而,现有的语音识别系统通常依赖于嵌入式说话人矢量的天真组合来控制说话人,这就影响了对说话风格、语音印记和语义准确性的捕捉。为了克服这一问题,我们推出了双分支扬声器控制模块(TSCM),这是一种新颖且适应性强的语音克隆模块,旨在精确处理目标扬声器的扬声器或风格控制。我们的方法将来自门控卷积网络(GCN)的局部级特征和来自门控递归单元(GRU)的语篇级特征先进地融合在一起,以增强对说话人的控制。我们将 TSCM 集成到 FastSpeech 2 和 VITS 架构等先进的 TTS 系统中,显著优化了这些系统的性能,从而证明了 TSCM 的有效性。实验结果表明,通过对预先训练好的 TTS 模型进行零次或少量微调,TSCM 可以用最少的数据为目标说话者实现精确的语音克隆。此外,我们基于 TSCM 的 VITS(TSCM-VITS)与现有最先进的 VC 系统相比,即使在基本数据集配置的情况下,也能在零镜头场景中显示出卓越的性能。通过全面的主观和客观评估,我们的方法的优越性得到了验证。我们的系统演示可在 https://great-research.github.io/tsct-tts-demo/ 网站上获得,它提供了有关其应用和有效性的实用见解。
{"title":"Optimizing feature fusion for improved zero-shot adaptation in text-to-speech synthesis","authors":"Zhiyong Chen, Zhiqi Ai, Youxuan Ma, Xinnuo Li, Shugong Xu","doi":"10.1186/s13636-024-00351-9","DOIUrl":"https://doi.org/10.1186/s13636-024-00351-9","url":null,"abstract":"In the era of advanced text-to-speech (TTS) systems capable of generating high-fidelity, human-like speech by referring a reference speech, voice cloning (VC), or zero-shot TTS (ZS-TTS), stands out as an important subtask. A primary challenge in VC is maintaining speech quality and speaker similarity with limited reference data for a specific speaker. However, existing VC systems often rely on naive combinations of embedded speaker vectors for speaker control, which compromises the capture of speaking style, voice print, and semantic accuracy. To overcome this, we introduce the Two-branch Speaker Control Module (TSCM), a novel and highly adaptable voice cloning module designed to precisely processing speaker or style control for a target speaker. Our method uses an advanced fusion of local-level features from a Gated Convolutional Network (GCN) and utterance-level features from a gated recurrent unit (GRU) to enhance speaker control. We demonstrate the effectiveness of TSCM by integrating it into advanced TTS systems like FastSpeech 2 and VITS architectures, significantly optimizing their performance. Experimental results show that TSCM enables accurate voice cloning for a target speaker with minimal data through both zero-shot or few-shot fine-tuning of pretrained TTS models. Furthermore, our TSCM-based VITS (TSCM-VITS) showcases superior performance in zero-shot scenarios compared to existing state-of-the-art VC systems, even with basic dataset configurations. Our method’s superiority is validated through comprehensive subjective and objective evaluations. A demonstration of our system is available at https://great-research.github.io/tsct-tts-demo/ , providing practical insights into its application and effectiveness.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"48 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards multidimensional attentive voice tracking—estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling 实现多维专注语音跟踪--利用回归神经网络和蒙特卡洛抽样从听觉瞥见中估计语音状态
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-05-22 DOI: 10.1186/s13636-024-00350-w
Joanna Luberadzka, Hendrik Kayser, Jörg Lücke, Volker Hohmann
Selective attention is a crucial ability of the auditory system. Computationally, following an auditory object can be illustrated as tracking its acoustic properties, e.g., pitch, timbre, or location in space. The difficulty is related to the fact that in a complex auditory scene, the information about the tracked object is not available in a clean form. The more cluttered the sound mixture, the more time and frequency regions where the object of interest is masked by other sound sources. How does the auditory system recognize and follow acoustic objects based on this fragmentary information? Numerous studies highlight the crucial role of top-down processing in this task. Having in mind both auditory modeling and signal processing applications, we investigated how computational methods with and without top-down processing deal with increasing sparsity of the auditory features in the task of estimating instantaneous voice states, defined as a combination of three parameters: fundamental frequency F0 and formant frequencies F1 and F2. We found that the benefit from top-down processing grows with increasing sparseness of the auditory data.
选择性注意是听觉系统的一项重要能力。从计算角度看,跟踪一个听觉对象可以理解为跟踪其声学特性,如音高、音色或在空间中的位置。困难在于,在复杂的听觉场景中,被跟踪对象的信息并不是以简洁的形式存在的。声音混合物越杂乱,感兴趣对象被其他声源掩盖的时间和频率区域就越多。听觉系统如何根据这些零散的信息识别和跟踪声音对象呢?大量研究强调了自上而下的处理过程在这项任务中的关键作用。考虑到听觉建模和信号处理应用,我们研究了在估计瞬时语音状态(定义为三个参数的组合:基频 F0 和声母频率 F1 和 F2)的任务中,采用和不采用自上而下处理的计算方法如何处理日益稀疏的听觉特征。我们发现,随着听觉数据的稀疏程度增加,自上而下处理的优势也在增加。
{"title":"Towards multidimensional attentive voice tracking—estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling","authors":"Joanna Luberadzka, Hendrik Kayser, Jörg Lücke, Volker Hohmann","doi":"10.1186/s13636-024-00350-w","DOIUrl":"https://doi.org/10.1186/s13636-024-00350-w","url":null,"abstract":"Selective attention is a crucial ability of the auditory system. Computationally, following an auditory object can be illustrated as tracking its acoustic properties, e.g., pitch, timbre, or location in space. The difficulty is related to the fact that in a complex auditory scene, the information about the tracked object is not available in a clean form. The more cluttered the sound mixture, the more time and frequency regions where the object of interest is masked by other sound sources. How does the auditory system recognize and follow acoustic objects based on this fragmentary information? Numerous studies highlight the crucial role of top-down processing in this task. Having in mind both auditory modeling and signal processing applications, we investigated how computational methods with and without top-down processing deal with increasing sparsity of the auditory features in the task of estimating instantaneous voice states, defined as a combination of three parameters: fundamental frequency F0 and formant frequencies F1 and F2. We found that the benefit from top-down processing grows with increasing sparseness of the auditory data.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"33 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141153790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sampling the user controls in neural modeling of audio devices 音频设备神经建模中的用户控制采样
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-05-20 DOI: 10.1186/s13636-024-00347-5
Otto Mikkonen, Alec Wright, Vesa Välimäki
This work studies neural modeling of nonlinear parametric audio circuits, focusing on how the diversity of settings of the target device user controls seen during training affects network generalization. To study the problem, a large corpus of training datasets is synthetically generated using SPICE simulations of two distinct devices, an analog equalizer and an analog distortion pedal. A proven recurrent neural network architecture is trained using each dataset. The difference in the datasets is in the sampling resolution of the device user controls and in their overall size. Based on objective and subjective evaluation of the trained models, a sampling resolution of five for the device parameters is found to be sufficient to capture the behavior of the target systems for the types of devices considered during the study. This result is desirable, since a dense sampling grid can be impractical to realize in the general case when no automated way of setting the device parameters is available, while collecting large amounts of data using a sparse grid only incurs small additional costs. Thus, the result provides guidance for efficient collection of training data for neural modeling of other similar audio devices.
这项工作研究的是非线性参数音频电路的神经建模,重点是在训练过程中看到的目标设备用户控制设置的多样性如何影响网络泛化。为了研究这个问题,我们使用两种不同设备(模拟均衡器和模拟失真踏板)的 SPICE 仿真合成了大量训练数据集。使用每个数据集对经过验证的递归神经网络架构进行训练。数据集的不同之处在于设备用户控制的采样分辨率及其总体大小。根据对训练好的模型进行的客观和主观评估,我们发现设备参数的采样分辨率为 5,足以捕捉到研究中考虑的设备类型的目标系统行为。这一结果是可取的,因为在一般情况下,如果没有自动设置设备参数的方法,密集的采样网格是不切实际的,而使用稀疏网格收集大量数据只会产生少量额外成本。因此,该结果为其他类似音频设备的神经建模提供了有效收集训练数据的指导。
{"title":"Sampling the user controls in neural modeling of audio devices","authors":"Otto Mikkonen, Alec Wright, Vesa Välimäki","doi":"10.1186/s13636-024-00347-5","DOIUrl":"https://doi.org/10.1186/s13636-024-00347-5","url":null,"abstract":"This work studies neural modeling of nonlinear parametric audio circuits, focusing on how the diversity of settings of the target device user controls seen during training affects network generalization. To study the problem, a large corpus of training datasets is synthetically generated using SPICE simulations of two distinct devices, an analog equalizer and an analog distortion pedal. A proven recurrent neural network architecture is trained using each dataset. The difference in the datasets is in the sampling resolution of the device user controls and in their overall size. Based on objective and subjective evaluation of the trained models, a sampling resolution of five for the device parameters is found to be sufficient to capture the behavior of the target systems for the types of devices considered during the study. This result is desirable, since a dense sampling grid can be impractical to realize in the general case when no automated way of setting the device parameters is available, while collecting large amounts of data using a sparse grid only incurs small additional costs. Thus, the result provides guidance for efficient collection of training data for neural modeling of other similar audio devices.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"41 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141151268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continuous lipreading based on acoustic temporal alignments 基于声学时序排列的连续唇语阅读
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-05-06 DOI: 10.1186/s13636-024-00345-7
David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos
Visual speech recognition (VSR) is a challenging task that has received increasing interest during the last few decades. Current state of the art employs powerful end-to-end architectures based on deep learning which depend on large amounts of data and high computational resources for their estimation. We address the task of VSR for data scarcity scenarios with limited computational resources by using traditional approaches based on hidden Markov models. We present a novel learning strategy that employs information obtained from previous acoustic temporal alignments to improve the visual system performance. Furthermore, we studied multiple visual speech representations and how image resolution or frame rate affect its performance. All these experiments were conducted on the limited data VLRF corpus, a database which offers an audio-visual support to address continuous speech recognition in Spanish. The results show that our approach significantly outperforms the best results achieved on the task to date.
视觉语音识别(VSR)是一项极具挑战性的任务,在过去几十年中受到越来越多的关注。目前的技术采用了基于深度学习的强大端到端架构,这些架构的估算依赖于大量数据和高计算资源。我们通过使用基于隐马尔可夫模型的传统方法,解决了在计算资源有限的情况下数据稀缺场景下的 VSR 任务。我们提出了一种新颖的学习策略,利用从以前的声学时序排列中获得的信息来提高视觉系统的性能。此外,我们还研究了多种视觉语音表征以及图像分辨率或帧速率对其性能的影响。所有这些实验都是在数据有限的 VLRF 语料库上进行的,该语料库为西班牙语连续语音识别提供了视听支持。结果表明,我们的方法明显优于迄今为止在该任务上取得的最佳结果。
{"title":"Continuous lipreading based on acoustic temporal alignments","authors":"David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos","doi":"10.1186/s13636-024-00345-7","DOIUrl":"https://doi.org/10.1186/s13636-024-00345-7","url":null,"abstract":"Visual speech recognition (VSR) is a challenging task that has received increasing interest during the last few decades. Current state of the art employs powerful end-to-end architectures based on deep learning which depend on large amounts of data and high computational resources for their estimation. We address the task of VSR for data scarcity scenarios with limited computational resources by using traditional approaches based on hidden Markov models. We present a novel learning strategy that employs information obtained from previous acoustic temporal alignments to improve the visual system performance. Furthermore, we studied multiple visual speech representations and how image resolution or frame rate affect its performance. All these experiments were conducted on the limited data VLRF corpus, a database which offers an audio-visual support to address continuous speech recognition in Spanish. The results show that our approach significantly outperforms the best results achieved on the task to date.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"9 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140881737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models Mi-Go:使用 YouTube 作为数据源评估通用语音识别机器学习模型的工具
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-05-01 DOI: 10.1186/s13636-024-00343-9
Tomasz Wojnar, Jarosław Hryszko, Adam Roman
This article introduces Mi-Go, a tool aimed at evaluating the performance and adaptability of general-purpose speech recognition machine learning models across diverse real-world scenarios. The tool leverages YouTube as a rich and continuously updated data source, accounting for multiple languages, accents, dialects, speaking styles, and audio quality levels. To demonstrate the effectiveness of the tool, an experiment was conducted, by using Mi-Go to evaluate state-of-the-art automatic speech recognition machine learning models. The evaluation involved a total of 141 randomly selected YouTube videos. The results underscore the utility of YouTube as a valuable data source for evaluation of speech recognition models, ensuring their robustness, accuracy, and adaptability to diverse languages and acoustic conditions. Additionally, by contrasting the machine-generated transcriptions against human-made subtitles, the Mi-Go tool can help pinpoint potential misuse of YouTube subtitles, like search engine optimization.
本文介绍的 Mi-Go 是一款工具,旨在评估通用语音识别机器学习模型在不同真实场景中的性能和适应性。该工具利用 YouTube 作为丰富且持续更新的数据源,考虑了多种语言、口音、方言、说话风格和音频质量水平。为了证明该工具的有效性,我们使用 Mi-Go 进行了一项实验,以评估最先进的自动语音识别机器学习模型。评估共涉及 141 个随机选择的 YouTube 视频。结果表明,YouTube 是评估语音识别模型的宝贵数据源,可确保模型的稳健性、准确性以及对不同语言和声学条件的适应性。此外,通过将机器生成的转录内容与人工制作的字幕进行对比,Mi-Go 工具可以帮助确定 YouTube 字幕的潜在滥用情况,如搜索引擎优化。
{"title":"Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models","authors":"Tomasz Wojnar, Jarosław Hryszko, Adam Roman","doi":"10.1186/s13636-024-00343-9","DOIUrl":"https://doi.org/10.1186/s13636-024-00343-9","url":null,"abstract":"This article introduces Mi-Go, a tool aimed at evaluating the performance and adaptability of general-purpose speech recognition machine learning models across diverse real-world scenarios. The tool leverages YouTube as a rich and continuously updated data source, accounting for multiple languages, accents, dialects, speaking styles, and audio quality levels. To demonstrate the effectiveness of the tool, an experiment was conducted, by using Mi-Go to evaluate state-of-the-art automatic speech recognition machine learning models. The evaluation involved a total of 141 randomly selected YouTube videos. The results underscore the utility of YouTube as a valuable data source for evaluation of speech recognition models, ensuring their robustness, accuracy, and adaptability to diverse languages and acoustic conditions. Additionally, by contrasting the machine-generated transcriptions against human-made subtitles, the Mi-Go tool can help pinpoint potential misuse of YouTube subtitles, like search engine optimization.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"43 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140835484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the power of pure attention mechanisms in blind room parameter estimation 探索纯注意力机制在盲室参数估计中的威力
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-04-24 DOI: 10.1186/s13636-024-00344-8
Chunxi Wang, Maoshen Jia, Meiran Li, Changchun Bao, Wenyu Jin
Dynamic parameterization of acoustic environments has drawn widespread attention in the field of audio processing. Precise representation of local room acoustic characteristics is crucial when designing audio filters for various audio rendering applications. Key parameters in this context include reverberation time (RT $$_{60}$$ ) and geometric room volume. In recent years, neural networks have been extensively applied in the task of blind room parameter estimation. However, there remains a question of whether pure attention mechanisms can achieve superior performance in this task. To address this issue, this study employs blind room parameter estimation based on monaural noisy speech signals. Various model architectures are investigated, including a proposed attention-based model. This model is a convolution-free Audio Spectrogram Transformer, utilizing patch splitting, attention mechanisms, and cross-modality transfer learning from a pretrained Vision Transformer. Experimental results suggest that the proposed attention mechanism-based model, relying purely on attention mechanisms without using convolution, exhibits significantly improved performance across various room parameter estimation tasks, especially with the help of dedicated pretraining and data augmentation schemes. Additionally, the model demonstrates more advantageous adaptability and robustness when handling variable-length audio inputs compared to existing methods.
声学环境的动态参数化已引起音频处理领域的广泛关注。在为各种音频渲染应用设计音频滤波器时,精确呈现房间的局部声学特性至关重要。其中的关键参数包括混响时间(RT $$_{60}$ )和房间几何容积。近年来,神经网络已被广泛应用于盲室参数估计任务中。然而,在这项任务中,纯粹的注意力机制是否能取得优异的性能仍是一个问题。为了解决这个问题,本研究采用了基于单耳噪声语音信号的盲室参数估计。研究了各种模型架构,包括一个基于注意力的模型。该模型是一个无卷积的音频频谱图变换器,利用了补丁分割、注意力机制和来自预训练视觉变换器的跨模态迁移学习。实验结果表明,所提出的基于注意力机制的模型纯粹依靠注意力机制而不使用卷积,在各种房间参数估计任务中表现出显著的性能提升,尤其是在专用预训练和数据增强方案的帮助下。此外,与现有方法相比,该模型在处理变长音频输入时表现出更强的适应性和鲁棒性。
{"title":"Exploring the power of pure attention mechanisms in blind room parameter estimation","authors":"Chunxi Wang, Maoshen Jia, Meiran Li, Changchun Bao, Wenyu Jin","doi":"10.1186/s13636-024-00344-8","DOIUrl":"https://doi.org/10.1186/s13636-024-00344-8","url":null,"abstract":"Dynamic parameterization of acoustic environments has drawn widespread attention in the field of audio processing. Precise representation of local room acoustic characteristics is crucial when designing audio filters for various audio rendering applications. Key parameters in this context include reverberation time (RT $$_{60}$$ ) and geometric room volume. In recent years, neural networks have been extensively applied in the task of blind room parameter estimation. However, there remains a question of whether pure attention mechanisms can achieve superior performance in this task. To address this issue, this study employs blind room parameter estimation based on monaural noisy speech signals. Various model architectures are investigated, including a proposed attention-based model. This model is a convolution-free Audio Spectrogram Transformer, utilizing patch splitting, attention mechanisms, and cross-modality transfer learning from a pretrained Vision Transformer. Experimental results suggest that the proposed attention mechanism-based model, relying purely on attention mechanisms without using convolution, exhibits significantly improved performance across various room parameter estimation tasks, especially with the help of dedicated pretraining and data augmentation schemes. Additionally, the model demonstrates more advantageous adaptability and robustness when handling variable-length audio inputs compared to existing methods.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"10 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140805125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust acoustic reflector localization using a modified EM algorithm 使用改进的电磁算法进行稳健的声反射器定位
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-04-18 DOI: 10.1186/s13636-024-00340-y
Usama Saqib, Mads Græsbøll Christensen, Jesper Rindom Jensen
In robotics, echolocation has been used to detect acoustic reflectors, e.g., walls, as it aids the robotic platform to navigate in darkness and also helps detect transparent surfaces. However, the transfer function or response of an acoustic system, e.g., loudspeakers/emitters, contributes to non-ideal behavior within the acoustic systems that can contribute to a phase lag due to propagation delay. This non-ideal response can hinder the performance of a time-of-arrival (TOA) estimator intended for acoustic reflector localization especially when the estimation of multiple reflections is required. In this paper, we, therefore, propose a robust expectation-maximization (EM) algorithm that takes into account the response of acoustic systems to enhance the TOA estimation accuracy when estimating multiple reflections when the robot is placed in a corner of a room. A non-ideal transfer function is built with two parameters, which are estimated recursively within the estimator. To test the proposed method, a hardware proof-of-concept setup was built with two different designs. The experimental results show that the proposed method could detect an acoustic reflector up to a distance of 1.6 m with $$60%$$ accuracy under the signal-to-noise ratio (SNR) of 0 dB. Compared to the state-of-the-art EM algorithm, our proposed method provides improved performance when estimating TOA by $$10%$$ under a low SNR value.
在机器人技术中,回声定位被用于探测声反射器,如墙壁,因为它有助于机器人平台在黑暗中导航,也有助于探测透明表面。然而,声学系统(如扬声器/发射器)的传递函数或响应会导致声学系统内的非理想行为,从而由于传播延迟而产生相位滞后。这种非理想响应会妨碍用于声反射体定位的到达时间(TOA)估计器的性能,尤其是在需要估计多次反射的情况下。因此,我们在本文中提出了一种稳健的期望最大化(EM)算法,该算法考虑到了声学系统的响应,以提高机器人放置在房间角落时估计多重反射时的 TOA 估计精度。利用两个参数建立了一个非理想传递函数,并在估算器中对这两个参数进行递归估算。为了测试所提出的方法,我们用两种不同的设计搭建了一个硬件概念验证装置。实验结果表明,在信噪比(SNR)为 0 dB 的情况下,所提出的方法能以 60%$$ 的精度检测到距离为 1.6 m 的声反射器。与最先进的电磁算法相比,我们提出的方法在低信噪比下估计 TOA 的性能提高了 $$10%$。
{"title":"Robust acoustic reflector localization using a modified EM algorithm","authors":"Usama Saqib, Mads Græsbøll Christensen, Jesper Rindom Jensen","doi":"10.1186/s13636-024-00340-y","DOIUrl":"https://doi.org/10.1186/s13636-024-00340-y","url":null,"abstract":"In robotics, echolocation has been used to detect acoustic reflectors, e.g., walls, as it aids the robotic platform to navigate in darkness and also helps detect transparent surfaces. However, the transfer function or response of an acoustic system, e.g., loudspeakers/emitters, contributes to non-ideal behavior within the acoustic systems that can contribute to a phase lag due to propagation delay. This non-ideal response can hinder the performance of a time-of-arrival (TOA) estimator intended for acoustic reflector localization especially when the estimation of multiple reflections is required. In this paper, we, therefore, propose a robust expectation-maximization (EM) algorithm that takes into account the response of acoustic systems to enhance the TOA estimation accuracy when estimating multiple reflections when the robot is placed in a corner of a room. A non-ideal transfer function is built with two parameters, which are estimated recursively within the estimator. To test the proposed method, a hardware proof-of-concept setup was built with two different designs. The experimental results show that the proposed method could detect an acoustic reflector up to a distance of 1.6 m with $$60%$$ accuracy under the signal-to-noise ratio (SNR) of 0 dB. Compared to the state-of-the-art EM algorithm, our proposed method provides improved performance when estimating TOA by $$10%$$ under a low SNR value.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"9 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140616242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement 用于单声道语音增强的监督注意多尺度时空卷积网络
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-04-11 DOI: 10.1186/s13636-024-00341-x
Zehua Zhang, Lu Zhang, Xuyi Zhuang, Yukun Qian, Mingjiang Wang
Speech signals are often distorted by reverberation and noise, with a widely distributed signal-to-noise ratio (SNR). To address this, our study develops robust, deep neural network (DNN)-based speech enhancement methods. We reproduce several DNN-based monaural speech enhancement methods and outline a strategy for constructing datasets. This strategy, validated through experimental reproductions, has effectively enhanced the denoising efficiency and robustness of the models. Then, we propose a causal speech enhancement system named Supervised Attention Multi-Scale Temporal Convolutional Network (SA-MSTCN). SA-MSTCN extracts the complex compressed spectrum (CCS) for input encoding and employs complex ratio masking (CRM) for output decoding. The supervised attention module, a lightweight addition to SA-MSTCN, guides feature extraction. Experiment results show that the supervised attention module effectively improves noise reduction performance with a minor increase in computational cost. The multi-scale temporal convolutional network refines the perceptual field and better reconstructs the speech signal. Overall, SA-MSTCN not only achieves state-of-the-art speech quality and intelligibility compared to other methods but also maintains stable denoising performance across various environments.
语音信号通常会受到混响和噪声的干扰,信噪比(SNR)分布广泛。为此,我们的研究开发了基于深度神经网络(DNN)的鲁棒性语音增强方法。我们重现了几种基于 DNN 的单声道语音增强方法,并概述了构建数据集的策略。通过实验验证,这一策略有效提高了模型的去噪效率和鲁棒性。然后,我们提出了一种因果语音增强系统,名为 "监督注意多尺度时空卷积网络(SA-MSTCN)"。SA-MSTCN 提取复杂压缩频谱(CCS)进行输入编码,并采用复杂比率掩蔽(CRM)进行输出解码。监督注意力模块是 SA-MSTCN 的轻量级附加模块,用于指导特征提取。实验结果表明,监督注意力模块能有效提高降噪性能,而计算成本仅略有增加。多尺度时空卷积网络完善了感知场,更好地重建了语音信号。总之,与其他方法相比,SA-MSTCN 不仅能达到最先进的语音质量和可懂度,而且能在各种环境下保持稳定的去噪性能。
{"title":"Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement","authors":"Zehua Zhang, Lu Zhang, Xuyi Zhuang, Yukun Qian, Mingjiang Wang","doi":"10.1186/s13636-024-00341-x","DOIUrl":"https://doi.org/10.1186/s13636-024-00341-x","url":null,"abstract":"Speech signals are often distorted by reverberation and noise, with a widely distributed signal-to-noise ratio (SNR). To address this, our study develops robust, deep neural network (DNN)-based speech enhancement methods. We reproduce several DNN-based monaural speech enhancement methods and outline a strategy for constructing datasets. This strategy, validated through experimental reproductions, has effectively enhanced the denoising efficiency and robustness of the models. Then, we propose a causal speech enhancement system named Supervised Attention Multi-Scale Temporal Convolutional Network (SA-MSTCN). SA-MSTCN extracts the complex compressed spectrum (CCS) for input encoding and employs complex ratio masking (CRM) for output decoding. The supervised attention module, a lightweight addition to SA-MSTCN, guides feature extraction. Experiment results show that the supervised attention module effectively improves noise reduction performance with a minor increase in computational cost. The multi-scale temporal convolutional network refines the perceptual field and better reconstructs the speech signal. Overall, SA-MSTCN not only achieves state-of-the-art speech quality and intelligibility compared to other methods but also maintains stable denoising performance across various environments.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"299 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140596436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection 更正:DeepDet:用于 TTS 合成检测的 YAMNet 和 BottleNeck Attention Module (BAM)
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-04-11 DOI: 10.1186/s13636-024-00342-w
Rabbia Mahum, Aun Irtaza, Ali Javed, Haitham A. Mahmoud, Haseeb Hassan

Correction: EURASIP J. Audio Speech Music Process 2024, 18 (2024)

https://doi.org/10.1186/s13636-024-00335-9


Following publication of the original article [1], we have been notified that:

-Equation 9 was missing from the paper, therefore all equations have been renumbered.

-The title should be modified from “DeepDet: YAMNet with BottleNeck Attention Module (BAM) TTS synthesis detection” to “DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection”.

-The Acknowledgements section needs to include the following statement:

The authors extend their appreciation to King Saud University for funding this work through Researchers Supporting Project number (RSPD2024R1006), King Saud University, Riyadh, Saudi Arabia.

-The below text in the Funding section has been removed:

The authors extend their appreciation to the Deputyship for Research and Innovation, “Ministry of Education” in Saudi Arabia for funding this research (IFKSUOR3–561–2).

The original article has been corrected.

  1. Mahum et al., DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection. EURASIP J. Audio Speech Music Process. 2024, 18 (2024). https://doi.org/10.1186/s13636-024-00335-9

    Article Google Scholar

Download references

Authors and Affiliations

  1. Computer Science Department, UET Taxila, Taxila, Pakistan

    Rabbia Mahum & Aun Irtaza

  2. Software Engineering Department, UET Taxila, Taxila, Pakistan

    Ali Javed

  3. Industrial Engineering Department, College of Engineering, King Saud University, 11421, Riyadh, Saudi Arabia

    Haitham A. Mahmoud

  4. College of Big Data and Internet, Shenzhen Technology University (SZTU), Shenzhen, China

    Haseeb Hassan

Authors
  1. Rabbia MahumView author publications

    You can also search for this author in PubMed Google Scholar

  2. Aun IrtazaView author publications

    You can also search for this author in PubMed Google Scholar

  3. Ali JavedView author publications

    You can also search for this author in PubMed Google Scholar

  4. Haitham A. MahmoudView author publications

    You can also search for this author in PubMed Google Scholar

  5. Haseeb HassanView author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rabbia Mahum.

O

更正:EURASIP J. Audio Speech Music Process 2024, 18 (2024)https://doi.org/10.1186/s13636-024-00335-9Following 原文[1]发表后,我们被告知:-论文中缺少公式 9,因此所有公式已重新编号。-标题应从 "DeepDet:标题应从 "DeepDet: YAMNet with BottleNeck Attention Module (BAM) TTS synthesis detection "修改为 "DeepDet:-致谢部分需要包括以下声明:作者感谢沙特国王大学通过研究人员支持项目编号(RSPD2024R1006)资助这项工作,沙特国王大学,沙特阿拉伯利雅得。删除了 "资助 "部分的以下内容:作者感谢沙特阿拉伯 "教育部 "研究与创新部(Deputyship for Research and Innovation)资助本研究(IFKSUOR3-561-2):YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection.EURASIP J. Audio Speech Music Process.2024,18 (2024)。https://doi.org/10.1186/s13636-024-00335-9Article Google Scholar 下载参考文献作者和工作单位巴基斯坦塔克西拉,塔克西拉大学计算机科学系Rabbia Mahum & Aun Irtaza巴基斯坦塔克西拉,塔克西拉大学软件工程系Ali Javed沙特国王大学工程学院工业工程系,11421,利雅得,沙特阿拉伯Haitham A。MahmoudCollege of Big Data and Internet, Shenzhen Technology University (SZTU), Shenzhen, ChinaHaseeb Hassan作者Rabbia Mahum查看作者发表的文章您也可以在PubMed Google ScholarAun Irtaza查看作者发表的文章您也可以在PubMed Google ScholarAli Javed查看作者发表的文章您也可以在PubMed Google ScholarHaitham A.MahmoudView author publications您也可以在PubMed Google Scholar中搜索该作者Haseeb HassanView author publications您也可以在PubMed Google Scholar中搜索该作者Corresponding authorCorrespondence to Rabbia Mahum.开放存取本文采用知识共享署名 4.0 国际许可协议进行许可,该协议允许以任何媒介或格式使用、共享、改编、分发和复制,只要您适当注明原作者和来源,提供知识共享许可协议的链接,并注明是否进行了更改。本文中的图片或其他第三方材料均包含在文章的知识共享许可协议中,除非在材料的署名栏中另有说明。如果材料未包含在文章的知识共享许可协议中,且您打算使用的材料不符合法律规定或超出许可使用范围,则您需要直接从版权所有者处获得许可。要查看该许可的副本,请访问 http://creativecommons.org/licenses/by/4.0/.Reprints and permissionsCite this articleMahum, R., Irtaza, A., Javed, A. et al. Correction:DeepDet:用于 TTS 合成检测的 YAMNet 与 BottleNeck Attention Module (BAM).J audio speech music proc.2024, 21 (2024). https://doi.org/10.1186/s13636-024-00342-wDownload citationPublished: 11 April 2024DOI: https://doi.org/10.1186/s13636-024-00342-wShare this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative
{"title":"Correction: DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection","authors":"Rabbia Mahum, Aun Irtaza, Ali Javed, Haitham A. Mahmoud, Haseeb Hassan","doi":"10.1186/s13636-024-00342-w","DOIUrl":"https://doi.org/10.1186/s13636-024-00342-w","url":null,"abstract":"<p><b>Correction</b><b>:</b> <b>EURASIP J. Audio Speech Music Process 2024, 18 (2024)</b></p><p><b>https://doi.org/10.1186/s13636-024-00335-9</b></p><br/><p>Following publication of the original article [1], we have been notified that:</p><p>-Equation 9 was missing from the paper, therefore all equations have been renumbered.</p><p>-The title should be modified from “DeepDet: YAMNet with BottleNeck Attention Module (BAM) TTS synthesis detection” to “DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection”.</p><p>-The Acknowledgements section needs to include the following statement:</p><p>The authors extend their appreciation to King Saud University for funding this work through Researchers Supporting Project number (RSPD2024R1006), King Saud University, Riyadh, Saudi Arabia.</p><p>-The below text in the Funding section has been removed:</p><p>The authors extend their appreciation to the Deputyship for Research and Innovation, “Ministry of Education” in Saudi Arabia for funding this research (IFKSUOR3–561–2).</p><p>The original article has been corrected.</p><ol data-track-component=\"outbound reference\"><li data-counter=\"1.\"><p>Mahum et al., DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection. EURASIP J. Audio Speech Music Process. <b>2024</b>, 18 (2024). https://doi.org/10.1186/s13636-024-00335-9</p><p>Article Google Scholar </p></li></ol><p>Download references<svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" role=\"img\" width=\"16\"><use xlink:href=\"#icon-eds-i-download-medium\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"></use></svg></p><h3>Authors and Affiliations</h3><ol><li><p>Computer Science Department, UET Taxila, Taxila, Pakistan</p><p>Rabbia Mahum &amp; Aun Irtaza</p></li><li><p>Software Engineering Department, UET Taxila, Taxila, Pakistan</p><p>Ali Javed</p></li><li><p>Industrial Engineering Department, College of Engineering, King Saud University, 11421, Riyadh, Saudi Arabia</p><p>Haitham A. Mahmoud</p></li><li><p>College of Big Data and Internet, Shenzhen Technology University (SZTU), Shenzhen, China</p><p>Haseeb Hassan</p></li></ol><span>Authors</span><ol><li><span>Rabbia Mahum</span>View author publications<p>You can also search for this author in <span>PubMed<span> </span>Google Scholar</span></p></li><li><span>Aun Irtaza</span>View author publications<p>You can also search for this author in <span>PubMed<span> </span>Google Scholar</span></p></li><li><span>Ali Javed</span>View author publications<p>You can also search for this author in <span>PubMed<span> </span>Google Scholar</span></p></li><li><span>Haitham A. Mahmoud</span>View author publications<p>You can also search for this author in <span>PubMed<span> </span>Google Scholar</span></p></li><li><span>Haseeb Hassan</span>View author publications<p>You can also search for this author in <span>PubMed<span> </span>Google Scholar</span></p></li></ol><h3>Corresponding author</h3><p>Correspondence to Rabbia Mahum.</p><p><b>O","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"271 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140596322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-rate modulation encoding via unsupervised learning for audio event detection 通过无监督学习为音频事件检测进行多速率调制编码
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-04-01 DOI: 10.1186/s13636-024-00339-5
Sandeep Reddy Kothinti, Mounya Elhilali
Technologies in healthcare, smart homes, security, ecology, and entertainment all deploy audio event detection (AED) in order to detect sound events in an audio recording. Effective AED techniques rely heavily on supervised or semi-supervised models to capture the wide range of dynamics spanned by sound events in order to achieve temporally precise boundaries and accurate event classification. These methods require extensive collections of labeled or weakly labeled in-domain data, which is costly and labor-intensive. Importantly, these approaches do not fully leverage the inherent variability and range of dynamics across sound events, aspects that can be effectively identified through unsupervised methods. The present work proposes an approach based on multi-rate autoencoders that are pretrained in an unsupervised way to leverage unlabeled audio data and ultimately learn the rich temporal dynamics inherent in natural sound events. This approach utilizes parallel autoencoders that achieve decompositions of the modulation spectrum along different bands. In addition, we introduce a rate-selective temporal contrastive loss to align the training objective with event detection metrics. Optimizing the configuration of multi-rate encoders and the temporal contrastive loss leads to notable improvements in domestic sound event detection in the context of the DCASE challenge.
医疗保健、智能家居、安防、生态和娱乐领域的技术都采用了音频事件检测(AED)技术,以检测音频记录中的声音事件。有效的 AED 技术在很大程度上依赖于监督或半监督模型,以捕捉声音事件所跨越的广泛动态范围,从而实现时间上精确的边界和准确的事件分类。这些方法需要收集大量标注或弱标注的域内数据,成本高昂且劳动密集。重要的是,这些方法不能充分利用声音事件固有的可变性和动态范围,而这些方面可以通过无监督方法有效识别。本研究提出了一种基于多速率自动编码器的方法,该方法以无监督的方式对自动编码器进行预训练,以利用未标记的音频数据,最终学习自然声音事件中固有的丰富时间动态。这种方法利用并行自动编码器实现对不同频带调制频谱的分解。此外,我们还引入了速率选择性时态对比损失,使训练目标与事件检测指标保持一致。通过优化多速率编码器的配置和时间对比损失,在 DCASE 挑战赛的背景下,国内声音事件检测有了显著改善。
{"title":"Multi-rate modulation encoding via unsupervised learning for audio event detection","authors":"Sandeep Reddy Kothinti, Mounya Elhilali","doi":"10.1186/s13636-024-00339-5","DOIUrl":"https://doi.org/10.1186/s13636-024-00339-5","url":null,"abstract":"Technologies in healthcare, smart homes, security, ecology, and entertainment all deploy audio event detection (AED) in order to detect sound events in an audio recording. Effective AED techniques rely heavily on supervised or semi-supervised models to capture the wide range of dynamics spanned by sound events in order to achieve temporally precise boundaries and accurate event classification. These methods require extensive collections of labeled or weakly labeled in-domain data, which is costly and labor-intensive. Importantly, these approaches do not fully leverage the inherent variability and range of dynamics across sound events, aspects that can be effectively identified through unsupervised methods. The present work proposes an approach based on multi-rate autoencoders that are pretrained in an unsupervised way to leverage unlabeled audio data and ultimately learn the rich temporal dynamics inherent in natural sound events. This approach utilizes parallel autoencoders that achieve decompositions of the modulation spectrum along different bands. In addition, we introduce a rate-selective temporal contrastive loss to align the training objective with event detection metrics. Optimizing the configuration of multi-rate encoders and the temporal contrastive loss leads to notable improvements in domestic sound event detection in the context of the DCASE challenge.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"55 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140596313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Eurasip Journal on Audio Speech and Music Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1