首页 > 最新文献

IEEE/ACM Transactions on Audio, Speech, and Language Processing最新文献

英文 中文
Cross Domain Optimization for Speech Enhancement: Parallel or Cascade? 语音增强的跨域优化:并行还是级联?
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-26 DOI: 10.1109/TASLP.2024.3468026
Liang Wan;Hongqing Liu;Liming Shi;Yi Zhou;Lu Gan
This paper introduces five novel deep-learning architectures for speech enhancement. Existing methods typically use time-domain, time-frequency representations, or a hybrid approach. Recognizing the unique contributions of each domain to feature extraction and model design, this study investigates the integration of waveform and complex spectrogram models through cross-domain fusion to enhance speech feature learning and noise reduction, thereby improving speech quality. We examine both cascading and parallel configurations of waveform and complex spectrogram models to assess their effectiveness in speech enhancement. Additionally, we employ an orthogonal projection-based error decomposition technique and manage the inputs of individual sub-models to analyze factors affecting speech quality. The network is trained by optimizing three specific loss functions applied across all sub-models. Our experiments, using the DNS Challenge (ICASSP 2021) dataset, reveal that the proposed models surpass existing benchmarks in speech enhancement, offering superior speech quality and intelligibility. These results highlight the efficacy of our cross-domain fusion strategy.
本文介绍了用于语音增强的五种新型深度学习架构。现有方法通常使用时域、时频表示或混合方法。认识到每个域对特征提取和模型设计的独特贡献,本研究探讨了通过跨域融合来整合波形和复杂频谱模型,以增强语音特征学习和降噪,从而提高语音质量。我们研究了波形和复杂频谱图模型的级联和并行配置,以评估它们在语音增强中的有效性。此外,我们还采用了基于正交投影的误差分解技术,并对各个子模型的输入进行管理,以分析影响语音质量的因素。我们通过优化应用于所有子模型的三个特定损失函数来训练网络。我们使用 DNS Challenge(ICASSP 2021)数据集进行的实验表明,所提出的模型超越了语音增强方面的现有基准,提供了卓越的语音质量和可懂度。这些结果凸显了我们的跨域融合策略的功效。
{"title":"Cross Domain Optimization for Speech Enhancement: Parallel or Cascade?","authors":"Liang Wan;Hongqing Liu;Liming Shi;Yi Zhou;Lu Gan","doi":"10.1109/TASLP.2024.3468026","DOIUrl":"https://doi.org/10.1109/TASLP.2024.3468026","url":null,"abstract":"This paper introduces five novel deep-learning architectures for speech enhancement. Existing methods typically use time-domain, time-frequency representations, or a hybrid approach. Recognizing the unique contributions of each domain to feature extraction and model design, this study investigates the integration of waveform and complex spectrogram models through cross-domain fusion to enhance speech feature learning and noise reduction, thereby improving speech quality. We examine both cascading and parallel configurations of waveform and complex spectrogram models to assess their effectiveness in speech enhancement. Additionally, we employ an orthogonal projection-based error decomposition technique and manage the inputs of individual sub-models to analyze factors affecting speech quality. The network is trained by optimizing three specific loss functions applied across all sub-models. Our experiments, using the DNS Challenge (ICASSP 2021) dataset, reveal that the proposed models surpass existing benchmarks in speech enhancement, offering superior speech quality and intelligibility. These results highlight the efficacy of our cross-domain fusion strategy.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4328-4341"},"PeriodicalIF":4.1,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sound Field Estimation Based on Physics-Constrained Kernel Interpolation Adapted to Environment 基于适应环境的物理约束核插值的声场估计
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-25 DOI: 10.1109/TASLP.2024.3467951
Juliano G. C. Ribeiro;Shoichi Koyama;Ryosuke Horiuchi;Hiroshi Saruwatari
A sound field estimation method based on kernel interpolation with an adaptive kernel function is proposed. The kernel-interpolation-based sound field estimation methods enable physics-constrained interpolation from pressure measurements of distributed microphones with a linear estimator, which constrains interpolation functions to satisfy the Helmholtz equation. However, a fixed kernel function would not be capable of adapting to the acoustic environment in which the measurement is performed, limiting their applicability. To make the kernel function adaptive, we represent it with a sum of directed and residual trainable kernel functions. The directed kernel is defined by a weight function composed of a superposition of exponential functions to capture highly directional components. The weight function for the residual kernel is represented by neural networks to capture unpredictable spatial patterns of the residual components. Experimental results using simulated and real data indicate that the proposed method outperforms the current kernel-interpolation-based methods and a method based on physics-informed neural networks.
本文提出了一种基于具有自适应核函数的核插值的声场估计方法。基于核内插法的声场估算方法能够利用线性估算器对分布式传声器的压力测量结果进行物理约束内插法,该估算器约束内插法函数满足亥姆霍兹方程。然而,固定的核函数无法适应进行测量的声学环境,从而限制了其适用性。为了使核函数具有自适应能力,我们用定向核函数和残差可训练核函数的总和来表示核函数。定向内核由一个权重函数定义,该权重函数由指数函数叠加而成,用于捕捉高方向性成分。残差核的权重函数由神经网络表示,以捕捉残差成分的不可预测空间模式。使用模拟和真实数据的实验结果表明,所提出的方法优于目前基于内核插值的方法和基于物理信息神经网络的方法。
{"title":"Sound Field Estimation Based on Physics-Constrained Kernel Interpolation Adapted to Environment","authors":"Juliano G. C. Ribeiro;Shoichi Koyama;Ryosuke Horiuchi;Hiroshi Saruwatari","doi":"10.1109/TASLP.2024.3467951","DOIUrl":"https://doi.org/10.1109/TASLP.2024.3467951","url":null,"abstract":"A sound field estimation method based on kernel interpolation with an adaptive kernel function is proposed. The kernel-interpolation-based sound field estimation methods enable physics-constrained interpolation from pressure measurements of distributed microphones with a linear estimator, which constrains interpolation functions to satisfy the Helmholtz equation. However, a fixed kernel function would not be capable of adapting to the acoustic environment in which the measurement is performed, limiting their applicability. To make the kernel function adaptive, we represent it with a sum of directed and residual trainable kernel functions. The directed kernel is defined by a weight function composed of a superposition of exponential functions to capture highly directional components. The weight function for the residual kernel is represented by neural networks to capture unpredictable spatial patterns of the residual components. Experimental results using simulated and real data indicate that the proposed method outperforms the current kernel-interpolation-based methods and a method based on physics-informed neural networks.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4369-4383"},"PeriodicalIF":4.1,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10693558","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142430884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoders 高保真声码器的时频表示判别器研究
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-25 DOI: 10.1109/TASLP.2024.3468005
Yicheng Gu;Xueyao Zhang;Liumeng Xue;Haizhou Li;Zhizheng Wu
Generative Adversarial Network (GAN) based vocoders are superior in both inference speed and synthesis quality when reconstructing an audible waveform from an acoustic representation. This study focuses on improving the discriminator for GAN-based vocoders. Most existing Time-Frequency Representation (TFR)-based discriminators are rooted in Short-Time Fourier Transform (STFT), which owns a constant Time-Frequency (TF) resolution, linearly scaled center frequencies, and a fixed decomposition basis, making it incompatible with signals like singing voices that require dynamic attention for different frequency bands and different time intervals. Motivated by that, we propose a Multi-Scale Sub-Band Constant-Q Transform CQT (MS-SB-CQT) discriminator and a Multi-Scale Temporal-Compressed Continuous Wavelet Transform CWT (MS-TC-CWT) discriminator. Both CQT and CWT have a dynamic TF resolution for different frequency bands. In contrast, CQT has a better modeling ability in pitch information, and CWT has a better modeling ability in short-time transients. Experiments conducted on both speech and singing voices confirm the effectiveness of our proposed discriminators. Moreover, the STFT, CQT, and CWT-based discriminators can be used jointly for better performance. The proposed discriminators can boost the synthesis quality of various state-of-the-art GAN-based vocoders, including HiFi-GAN, BigVGAN, and APNet.
基于生成对抗网络(GAN)的声码器从声学表征重建可听波形时,在推理速度和合成质量方面都更胜一筹。本研究的重点是改进基于 GAN 的声码器的判别器。现有的基于时频表示法(TFR)的判别器大多植根于短时傅里叶变换(STFT),它具有恒定的时频(TF)分辨率、线性缩放的中心频率和固定的分解基础,因此不适合像歌声这样需要动态关注不同频段和不同时间间隔的信号。有鉴于此,我们提出了多尺度子带常数 Q 变换 CQT(MS-SB-CQT)判别器和多尺度时域压缩连续小波变换 CWT(MS-TC-CWT)判别器。CQT 和 CWT 对不同频段都具有动态 TF 分辨率。相比之下,CQT 对音高信息的建模能力更强,而 CWT 对短时瞬态的建模能力更强。在语音和歌声中进行的实验证实了我们提出的判别器的有效性。此外,基于 STFT、CQT 和 CWT 的判别器可以联合使用,以获得更好的性能。所提出的判别器可以提高各种基于 GAN 的最先进声码器的合成质量,包括 HiFi-GAN、BigVGAN 和 APNet。
{"title":"An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoders","authors":"Yicheng Gu;Xueyao Zhang;Liumeng Xue;Haizhou Li;Zhizheng Wu","doi":"10.1109/TASLP.2024.3468005","DOIUrl":"https://doi.org/10.1109/TASLP.2024.3468005","url":null,"abstract":"Generative Adversarial Network (GAN) based vocoders are superior in both inference speed and synthesis quality when reconstructing an audible waveform from an acoustic representation. This study focuses on improving the discriminator for GAN-based vocoders. Most existing Time-Frequency Representation (TFR)-based discriminators are rooted in Short-Time Fourier Transform (STFT), which owns a constant Time-Frequency (TF) resolution, linearly scaled center frequencies, and a fixed decomposition basis, making it incompatible with signals like singing voices that require dynamic attention for different frequency bands and different time intervals. Motivated by that, we propose a Multi-Scale Sub-Band Constant-Q Transform CQT (MS-SB-CQT) discriminator and a Multi-Scale Temporal-Compressed Continuous Wavelet Transform CWT (MS-TC-CWT) discriminator. Both CQT and CWT have a dynamic TF resolution for different frequency bands. In contrast, CQT has a better modeling ability in pitch information, and CWT has a better modeling ability in short-time transients. Experiments conducted on both speech and singing voices confirm the effectiveness of our proposed discriminators. Moreover, the STFT, CQT, and CWT-based discriminators can be used jointly for better performance. The proposed discriminators can boost the synthesis quality of various state-of-the-art GAN-based vocoders, including HiFi-GAN, BigVGAN, and APNet.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4569-4579"},"PeriodicalIF":4.1,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142518150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Three-Dimensional Room Transfer Function Parameterization Based on Multiple Concentric Planar Circular Arrays 基于多同心平面圆阵列的三维室内传递函数参数化
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-25 DOI: 10.1109/TASLP.2024.3468025
Lu Li;Maoshen Jia;Changchun Bao
This study proposes a three-dimensional room transfer function (RTF) parameterization method based on multiple concentric planar circular arrays, which exhibits robustness to variations in the positions of both the receiver and source. According to the harmonic solution to the wave equation, the RTFs between two spherical regions (sound source and receiver) in a room can be expressed as a weighted sum of spherical harmonics, whose weight coefficients serve as the RTF parameters, which can be estimated by placing multiple concentric planar circular arrays composed of monopole-source pairs (MSPs) and multiple concentric planar circular arrays composed of omnidirectional-microphone pairs (OMPs) in respective source and receiver regions. We use MSP arrays to generate required outgoing soundfields originating from a source region. We derive a method to use OMP arrays to estimate RTF parameters that are concealed within the captured soundfield, which can be employed to reconstruct the RTF from any point in the source region to any point in the receiver region. The accuracy of the RTF parameterization method is validated through simulation testing.
本研究提出了一种基于多个同心平面圆阵列的三维房间传递函数(RTF)参数化方法,该方法对接收器和声源位置的变化具有鲁棒性。根据波方程的谐波解,房间内两个球形区域(声源和接收器)之间的 RTF 可表示为球形谐波的加权和,其权重系数可作为 RTF 参数,通过在声源和接收器区域分别放置由单极声源对(MSP)和全向麦克风对(OMP)组成的多个同心平面圆阵列,可估算出 RTF 参数。我们使用 MSP 阵列来生成源自声源区域的所需外向声场。我们推导出一种使用 OMP 阵列估算隐藏在捕获声场中的 RTF 参数的方法,该方法可用于重建从声源区域任意点到接收区域任意点的 RTF。通过模拟测试验证了 RTF 参数化方法的准确性。
{"title":"Three-Dimensional Room Transfer Function Parameterization Based on Multiple Concentric Planar Circular Arrays","authors":"Lu Li;Maoshen Jia;Changchun Bao","doi":"10.1109/TASLP.2024.3468025","DOIUrl":"https://doi.org/10.1109/TASLP.2024.3468025","url":null,"abstract":"This study proposes a three-dimensional room transfer function (RTF) parameterization method based on multiple concentric planar circular arrays, which exhibits robustness to variations in the positions of both the receiver and source. According to the harmonic solution to the wave equation, the RTFs between two spherical regions (sound source and receiver) in a room can be expressed as a weighted sum of spherical harmonics, whose weight coefficients serve as the RTF parameters, which can be estimated by placing multiple concentric planar circular arrays composed of monopole-source pairs (MSPs) and multiple concentric planar circular arrays composed of omnidirectional-microphone pairs (OMPs) in respective source and receiver regions. We use MSP arrays to generate required outgoing soundfields originating from a source region. We derive a method to use OMP arrays to estimate RTF parameters that are concealed within the captured soundfield, which can be employed to reconstruct the RTF from any point in the source region to any point in the receiver region. The accuracy of the RTF parameterization method is validated through simulation testing.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4384-4398"},"PeriodicalIF":4.1,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142430805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Quantization of Neural Models for Speaker Verification 论用于验证说话人的神经模型量化
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-20 DOI: 10.1109/TASLP.2024.3463430
Vishal Kumar;Vinayak Abrol;Mathew Magamai Doss
This paper addresses the sub-optimality of current post-training quantization (PTQ) and quantization-aware training (QAT) methods for state-of-the-art speaker verification (SV) models featuring intricate architectural elements such as channel aggregation and squeeze excitation modules. To address these limitations, we propose 1) a data-independent PTQ technique employing iterative low-precision calibration on pre-trained models; and 2) a data-dependent QAT method designed to reduce the performance gap between full-precision and integer models. Our QAT involves two progressive stages where FP-32 weights are initially transformed into FP-8, adapting precision based on the gradient norm, followed by the learning of quantizer parameters (scale and zero-point) for INT8 conversion. Experimental validation underscores the ingenuity of our method in model quantization, demonstrating reduced floating-point operations and INT8 inference time, all while maintaining performance on par with full-precision models.
本文探讨了当前训练后量化(PTQ)和量化感知训练(QAT)方法对于具有复杂架构元素(如信道聚合和挤压激励模块)的最先进扬声器验证(SV)模型的次优化问题。为了解决这些局限性,我们提出了:1)一种与数据无关的 PTQ 技术,在预训练模型上采用迭代低精度校准;2)一种与数据无关的 QAT 方法,旨在缩小全精度模型和整数模型之间的性能差距。我们的 QAT 包括两个渐进阶段,首先将 FP-32 权重转换为 FP-8,根据梯度规范调整精度,然后学习量化器参数(标度和零点)以进行 INT8 转换。实验验证凸显了我们在模型量化方面的独创性,证明我们减少了浮点运算和 INT8 推理时间,同时保持了与全精度模型相同的性能。
{"title":"On the Quantization of Neural Models for Speaker Verification","authors":"Vishal Kumar;Vinayak Abrol;Mathew Magamai Doss","doi":"10.1109/TASLP.2024.3463430","DOIUrl":"https://doi.org/10.1109/TASLP.2024.3463430","url":null,"abstract":"This paper addresses the sub-optimality of current post-training quantization (PTQ) and quantization-aware training (QAT) methods for state-of-the-art speaker verification (SV) models featuring intricate architectural elements such as channel aggregation and squeeze excitation modules. To address these limitations, we propose 1) a data-independent PTQ technique employing iterative low-precision calibration on pre-trained models; and 2) a data-dependent QAT method designed to reduce the performance gap between full-precision and integer models. Our QAT involves two progressive stages where FP-32 weights are initially transformed into FP-8, adapting precision based on the gradient norm, followed by the learning of quantizer parameters (scale and zero-point) for INT8 conversion. Experimental validation underscores the ingenuity of our method in model quantization, demonstrating reduced floating-point operations and INT8 inference time, all while maintaining performance on par with full-precision models.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4226-4236"},"PeriodicalIF":4.1,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142328387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting 克服灾难性遗忘的贝叶斯参数高效微调技术
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-18 DOI: 10.1109/TASLP.2024.3463395
Haolin Chen;Philip N. Garner
We are motivated primarily by the adaptation of text-to-speech synthesis models; however we argue that more generic parameter-efficient fine-tuning (PEFT) is an appropriate framework to do such adaptation. Nevertheless, catastrophic forgetting remains an issue with PEFT, damaging the pre-trained model's inherent capabilities. We demonstrate that existing Bayesian learning techniques can be applied to PEFT to prevent catastrophic forgetting as long as the parameter shift of the fine-tuned layers can be calculated differentiably. In a principled series of experiments on language modeling and speech synthesis tasks, we utilize established Laplace approximations, including diagonal and Kronecker-factored approaches, to regularize PEFT with the low-rank adaptation (LoRA) and compare their performance in pre-training knowledge preservation. Our results demonstrate that catastrophic forgetting can be overcome by our methods without degrading the fine-tuning performance, and using the Kronecker-factored approximation produces a better preservation of the pre-training knowledge than the diagonal ones.
我们的主要动机是对文本到语音合成模型进行调整;但我们认为,更通用的参数高效微调(PEFT)是进行这种调整的合适框架。然而,灾难性遗忘仍然是 PEFT 的一个问题,它损害了预训练模型的固有能力。我们证明,只要微调层的参数偏移可以微分计算,现有的贝叶斯学习技术就可以应用于 PEFT,以防止灾难性遗忘。在语言建模和语音合成任务的一系列原则性实验中,我们利用已有的拉普拉斯近似方法(包括对角线方法和克朗克因子方法)对 PEFT 与低秩适应(LoRA)进行了正则化,并比较了它们在预训练知识保存方面的性能。结果表明,我们的方法可以在不降低微调性能的情况下克服灾难性遗忘,而使用 Kronecker-factored近似方法比对角线近似方法能更好地保存训练前知识。
{"title":"Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting","authors":"Haolin Chen;Philip N. Garner","doi":"10.1109/TASLP.2024.3463395","DOIUrl":"10.1109/TASLP.2024.3463395","url":null,"abstract":"We are motivated primarily by the adaptation of text-to-speech synthesis models; however we argue that more generic parameter-efficient fine-tuning (PEFT) is an appropriate framework to do such adaptation. Nevertheless, catastrophic forgetting remains an issue with PEFT, damaging the pre-trained model's inherent capabilities. We demonstrate that existing Bayesian learning techniques can be applied to PEFT to prevent catastrophic forgetting as long as the parameter shift of the fine-tuned layers can be calculated differentiably. In a principled series of experiments on language modeling and speech synthesis tasks, we utilize established Laplace approximations, including diagonal and Kronecker-factored approaches, to regularize PEFT with the low-rank adaptation (LoRA) and compare their performance in pre-training knowledge preservation. Our results demonstrate that catastrophic forgetting can be overcome by our methods without degrading the fine-tuning performance, and using the Kronecker-factored approximation produces a better preservation of the pre-training knowledge than the diagonal ones.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4253-4262"},"PeriodicalIF":4.1,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10683983","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeuroHeed: Neuro-Steered Speaker Extraction Using EEG Signals NeuroHeed:使用脑电信号的神经分层扬声器提取技术
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-18 DOI: 10.1109/TASLP.2024.3463498
Zexu Pan;Marvin Borsdorf;Siqi Cai;Tanja Schultz;Haizhou Li
Humans possess the remarkable ability to selectively attend to a single speaker amidst competing voices and background noise, known as selective auditory attention. Recent studies in auditory neuroscience indicate a strong correlation between the attended speech signal and the corresponding brain's elicited neuronal activities. In this work, we study such brain activities measured using affordable and non-intrusive electroencephalography (EEG) devices. We present NeuroHeed, a speaker extraction model that leverages the listener's synchronized EEG signals to extract the attended speech signal in a cocktail party scenario, in which the extraction process is conditioned on a neuronal attractor encoded from the EEG signal. We propose both an offline and an online NeuroHeed, with the latter designed for real-time inference. In the online NeuroHeed, we additionally propose an autoregressive speaker encoder, which accumulates past extracted speech signals for self-enrollment of the attended speaker information into an auditory attractor, that retains the attentional momentum over time. Online NeuroHeed extracts the current window of the speech signals with guidance from both attractors. Experimental results on KUL dataset two-speaker scenario demonstrate that NeuroHeed effectively extracts brain-attended speech signals with an average scale-invariant signal-to-noise ratio improvement (SI-SDRi) of 14.3 dB and extraction accuracy of 90.8% in offline settings, and SI-SDRi of 11.2 dB and extraction accuracy of 85.1% in online settings.
人类拥有一种非凡的能力,即在相互竞争的声音和背景噪声中选择性地注意单个说话者,这种能力被称为选择性听觉注意。听觉神经科学的最新研究表明,被注意的语音信号与相应的大脑神经元活动之间存在很强的相关性。在这项研究中,我们使用经济实惠的非侵入式脑电图(EEG)设备对这种大脑活动进行了研究。我们提出的 NeuroHeed 是一种说话者提取模型,它利用听者的同步脑电信号提取鸡尾酒会场景中的语音信号,提取过程以脑电信号编码的神经元吸引子为条件。我们提出了离线和在线 NeuroHeed,后者专为实时推理而设计。在在线 NeuroHeed 中,我们还提出了一个自回归扬声器编码器,该编码器会将过去提取的语音信号累积起来,以便将所关注的扬声器信息自加入听觉吸引子,从而保持注意力的长期动力。在线 NeuroHeed 在这两个吸引子的引导下提取语音信号的当前窗口。KUL 数据集双扬声器场景的实验结果表明,NeuroHeed 能有效提取大脑关注的语音信号,离线设置下的平均标度不变信噪比改进(SI-SDRi)为 14.3 dB,提取准确率为 90.8%;在线设置下的平均标度不变信噪比改进(SI-SDRi)为 11.2 dB,提取准确率为 85.1%。
{"title":"NeuroHeed: Neuro-Steered Speaker Extraction Using EEG Signals","authors":"Zexu Pan;Marvin Borsdorf;Siqi Cai;Tanja Schultz;Haizhou Li","doi":"10.1109/TASLP.2024.3463498","DOIUrl":"10.1109/TASLP.2024.3463498","url":null,"abstract":"Humans possess the remarkable ability to selectively attend to a single speaker amidst competing voices and background noise, known as \u0000<italic>selective auditory attention</i>\u0000. Recent studies in auditory neuroscience indicate a strong correlation between the attended speech signal and the corresponding brain's elicited neuronal activities. In this work, we study such brain activities measured using affordable and non-intrusive electroencephalography (EEG) devices. We present NeuroHeed, a speaker extraction model that leverages the listener's synchronized EEG signals to extract the attended speech signal in a cocktail party scenario, in which the extraction process is conditioned on a neuronal attractor encoded from the EEG signal. We propose both an offline and an online NeuroHeed, with the latter designed for real-time inference. In the online NeuroHeed, we additionally propose an autoregressive speaker encoder, which accumulates past extracted speech signals for self-enrollment of the attended speaker information into an auditory attractor, that retains the attentional momentum over time. Online NeuroHeed extracts the current window of the speech signals with guidance from both attractors. Experimental results on KUL dataset two-speaker scenario demonstrate that NeuroHeed effectively extracts brain-attended speech signals with an average scale-invariant signal-to-noise ratio improvement (SI-SDRi) of 14.3 dB and extraction accuracy of 90.8% in offline settings, and SI-SDRi of 11.2 dB and extraction accuracy of 85.1% in online settings.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4456-4470"},"PeriodicalIF":4.1,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10683957","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Detection of Speech Sound Disorder in Cantonese-Speaking Pre-School Children 自动检测粤语学龄前儿童的语音障碍
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-18 DOI: 10.1109/TASLP.2024.3463503
Si-Ioi Ng;Cymie Wing-Yee Ng;Jiarui Wang;Tan Lee
Speech sound disorder (SSD) is a type of developmental disorder in which children encounter persistent difficulties in correctly producing certain speech sounds. Conventionally, assessment of SSD relies largely on speech and language pathologists (SLPs) with appropriate language background. With the unsatisfied demand for qualified SLPs, automatic detection of SSD is highly desirable for assisting clinical work and improving the efficiency and quality of services. In this paper, methods and systems for fully automatic detection of SSD in young children are investigated. A microscopic approach and a macroscopic approach are developed. The microscopic system is based on detection of phonological errors in impaired child speech. A deep neural network (DNN) model is trained to learn the similarity and contrast between consonant segments. Phonological error is identified by contrasting a test speech segment to reference segments. The phone-level similarity scores are aggregated for speaker-level SSD detection. The macroscopic approach leverages holistic changes of speech characteristics related to disorders. Various types of speaker-level embeddings are investigated and compared. Experimental results show that the proposed microscopic system achieves unweighted average recall (UAR) from 84.0% to 91.9% on phone-level error detection. The proposed macroscopic approach can achieve a UAR of 89.0% on speaker-level SSD detection. The speaker embeddings adopted for macroscopic SSD detection can effectively discard the information related to speaker's personal identity.
言语发声障碍(SSD)是一种发育障碍,儿童在正确发出某些言语声音时会遇到持续性困难。传统上,对 SSD 的评估主要依赖于具有相应语言背景的言语和语言病理学家(SLPs)。由于对合格语言病理学家的需求得不到满足,自动检测 SSD 对辅助临床工作、提高服务效率和质量非常有必要。本文研究了全自动检测幼儿 SSD 的方法和系统。本文开发了一种微观方法和一种宏观方法。微观系统基于检测受损儿童语音中的语音错误。通过训练深度神经网络(DNN)模型来学习辅音片段之间的相似度和对比度。通过将测试语音片段与参考片段进行对比来识别语音错误。电话级的相似性分数被汇总到扬声器级的 SSD 检测中。宏观方法利用与障碍有关的语音特征的整体变化。研究并比较了各种类型的扬声器级嵌入。实验结果表明,所提出的微观系统在电话级错误检测方面实现了 84.0% 到 91.9% 的非加权平均召回率(UAR)。所提出的宏观方法在扬声器级 SSD 检测方面的 UAR 可达到 89.0%。宏观固态硬盘检测所采用的说话人嵌入可以有效地剔除与说话人个人身份相关的信息。
{"title":"Automatic Detection of Speech Sound Disorder in Cantonese-Speaking Pre-School Children","authors":"Si-Ioi Ng;Cymie Wing-Yee Ng;Jiarui Wang;Tan Lee","doi":"10.1109/TASLP.2024.3463503","DOIUrl":"10.1109/TASLP.2024.3463503","url":null,"abstract":"Speech sound disorder (SSD) is a type of developmental disorder in which children encounter persistent difficulties in correctly producing certain speech sounds. Conventionally, assessment of SSD relies largely on speech and language pathologists (SLPs) with appropriate language background. With the unsatisfied demand for qualified SLPs, automatic detection of SSD is highly desirable for assisting clinical work and improving the efficiency and quality of services. In this paper, methods and systems for fully automatic detection of SSD in young children are investigated. A microscopic approach and a macroscopic approach are developed. The microscopic system is based on detection of phonological errors in impaired child speech. A deep neural network (DNN) model is trained to learn the similarity and contrast between consonant segments. Phonological error is identified by contrasting a test speech segment to reference segments. The phone-level similarity scores are aggregated for speaker-level SSD detection. The macroscopic approach leverages holistic changes of speech characteristics related to disorders. Various types of speaker-level embeddings are investigated and compared. Experimental results show that the proposed microscopic system achieves unweighted average recall (UAR) from 84.0% to 91.9% on phone-level error detection. The proposed macroscopic approach can achieve a UAR of 89.0% on speaker-level SSD detection. The speaker embeddings adopted for macroscopic SSD detection can effectively discard the information related to speaker's personal identity.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4355-4368"},"PeriodicalIF":4.1,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Audio-Visual Fusion With Temporal Convolutional Attention Network for Speech Separation 利用时态卷积注意力网络进行音视频融合以实现语音分离
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-18 DOI: 10.1109/TASLP.2024.3463411
Debang Liu;Tianqi Zhang;Mads Græsbøll Christensen;Chen Yi;Zeliang An
Currently, audio-visual speech separation methods utilize the speaker's audio and visual correlation information to help separate the speech of the target speaker. However, these methods commonly use the approach of feature concatenation with linear mapping to obtain the fused audio-visual features, which prompts us to conduct a deeper exploration for audio-visual fusion. Therefore, in this paper, according to the speaker's mouth landmark movements during speech, we propose a novel time-domain single-channel audio-visual speech separation method: audio-visual fusion with temporal convolution attention network for speech separation model (AVTCA). In this method, we design temporal convolution attention network (TCANet) based on the attention mechanism to model the contextual relationships between audio and visual sequences, and use TCANet as the basic unit to construct sequence learning and fusion network. In the whole deep separation framework, we first use cross attention to focus on the cross-correlation information of the audio and visual sequences, and then we use the TCANet to fuse the audio-visual feature sequences with temporal dependencies and cross-correlations. Afterwards, the fused audio-visual features sequences will be used as input to the separation network to predict mask and separate the source of each speaker. Finally, this paper conducts comparative experiments on Vox2, GRID, LRS2 and TCD-TIMIT datasets, indicating that AVTCA outperforms other state-of-the-art (SOTA) separation methods. Furthermore, it exhibits greater efficiency in computational performance and model size.
目前,视听语音分离方法利用说话人的视听相关信息来帮助分离目标说话人的语音。然而,这些方法通常使用线性映射的特征串联方法来获得融合的视听特征,这促使我们对视听融合进行更深入的探索。因此,本文根据说话人在说话过程中的嘴部地标运动,提出了一种新颖的时域单通道视听语音分离方法:视听融合与时空卷积注意力网络语音分离模型(AVTCA)。在该方法中,我们设计了基于注意力机制的时空卷积注意力网络(TCANet)来模拟音频和视觉序列之间的上下文关系,并以 TCANet 为基本单元来构建序列学习和融合网络。在整个深度分离框架中,我们首先利用交叉注意力关注视听序列的交叉相关信息,然后利用 TCANet 融合具有时间依赖性和交叉相关性的视听特征序列。之后,融合后的视听特征序列将作为分离网络的输入,用于预测掩码和分离每个说话者的声源。最后,本文在 Vox2、GRID、LRS2 和 TCD-TIMIT 数据集上进行了对比实验,结果表明 AVTCA 优于其他最先进的(SOTA)分离方法。此外,它在计算性能和模型大小方面也表现出更高的效率。
{"title":"Audio-Visual Fusion With Temporal Convolutional Attention Network for Speech Separation","authors":"Debang Liu;Tianqi Zhang;Mads Græsbøll Christensen;Chen Yi;Zeliang An","doi":"10.1109/TASLP.2024.3463411","DOIUrl":"10.1109/TASLP.2024.3463411","url":null,"abstract":"Currently, audio-visual speech separation methods utilize the speaker's audio and visual correlation information to help separate the speech of the target speaker. However, these methods commonly use the approach of feature concatenation with linear mapping to obtain the fused audio-visual features, which prompts us to conduct a deeper exploration for audio-visual fusion. Therefore, in this paper, according to the speaker's mouth landmark movements during speech, we propose a novel time-domain single-channel audio-visual speech separation method: audio-visual fusion with temporal convolution attention network for speech separation model (AVTCA). In this method, we design temporal convolution attention network (TCANet) based on the attention mechanism to model the contextual relationships between audio and visual sequences, and use TCANet as the basic unit to construct sequence learning and fusion network. In the whole deep separation framework, we first use cross attention to focus on the cross-correlation information of the audio and visual sequences, and then we use the TCANet to fuse the audio-visual feature sequences with temporal dependencies and cross-correlations. Afterwards, the fused audio-visual features sequences will be used as input to the separation network to predict mask and separate the source of each speaker. Finally, this paper conducts comparative experiments on Vox2, GRID, LRS2 and TCD-TIMIT datasets, indicating that AVTCA outperforms other state-of-the-art (SOTA) separation methods. Furthermore, it exhibits greater efficiency in computational performance and model size.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4647-4660"},"PeriodicalIF":4.1,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Lightweight Speaker Verification With Broadcasting CNN-Transformer and Knowledge Distillation Training of Self-Attention Maps 利用广播式 CNN 变换器和自我注意力地图的知识蒸馏训练实现高效轻量级扬声器验证
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-18 DOI: 10.1109/TASLP.2024.3463491
Jeong-Hwan Choi;Joon-Young Yang;Joon-Hyuk Chang
Developing a lightweight speaker embedding extractor (SEE) is crucial for the practical implementation of automatic speaker verification (ASV) systems. To this end, we recently introduced broadcasting convolutional neural networks (CNNs)-meet-vision-Transformers (BC-CMT), a lightweight SEE that utilizes broadcasted residual learning (BRL) within the hybrid CNN-Transformer architecture to maintain a small number of model parameters. We proposed three BC-CMT-based SEE with three different sizes: BC-CMT-Tiny, -Small, and -Base. In this study, we extend our previously proposed BC-CMT by introducing an improved model architecture and a training strategy based on knowledge distillation (KD) using self-attention (SA) maps. First, to reduce the computational costs and latency of the BC-CMT, the two-dimensional (2D) SA operations in the BC-CMT, which calculate the SA maps in the frequency–time dimensions, are simplified to 1D SA operations that consider only temporal importance. Moreover, to enhance the SA capability of the BC-CMT, the group convolution layers in the SA block are adjusted to have smaller number of groups and are combined with the BRL operations. Second, to improve the training effectiveness of the modified BC-CMT-Tiny, the SA maps of a pretrained large BC-CMT-Base are used for the KD to guide those of a smaller BC-CMT-Tiny. Because the attention map sizes of the modified BC-CMT models do not depend on the number of frequency bins or convolution channels, the proposed strategy enables KD between feature maps with different sizes. The experimental results demonstrate that the proposed BC-CMT-Tiny model having 271.44K model parameters achieved 36.8% and 9.3% reduction in floating point operations on 1s signals and equal error rate (EER) on VoxCeleb 1 testset, respectively, compared to the conventional BC-CMT-Tiny. The CPU and GPU running time of the proposed BC-CMT-Tiny ranges of 1 to 10 s signals were 29.07 to 146.32 ms and 36.01 to 206.43 ms, respectively. The proposed KD further reduced the EER by 15.5% with improved attention capability.
开发轻量级扬声器嵌入提取器(SEE)对于实际应用自动扬声器验证(ASV)系统至关重要。为此,我们最近推出了广播卷积神经网络(CNN)-视觉变换器(BC-CMT),这是一种轻量级 SEE,它在混合 CNN-变换器架构中利用广播残差学习(BRL)来保持少量模型参数。我们提出了三种不同规模的基于 BC-CMT 的 SEE:BC-CMT-Tiny、-Small 和-Base。在本研究中,我们对之前提出的 BC-CMT 进行了扩展,引入了改进的模型架构和基于知识提炼(KD)的训练策略,并使用了自我注意(SA)地图。首先,为了降低 BC-CMT 的计算成本和延迟,我们将 BC-CMT 中计算频率-时间维度 SA 地图的二维 (2D) SA 操作简化为只考虑时间重要性的一维 SA 操作。此外,为了增强 BC-CMT 的 SA 能力,还将 SA 块中的分组卷积层调整为较少的分组数,并与 BRL 运算相结合。其次,为提高改进型 BC-CMT-Tiny 的训练效果,在 KD 中使用预训练的大型 BC-CMT-Base 的 SA 地图来指导小型 BC-CMT-Tiny 的 SA 地图。由于修改后的 BC-CMT 模型的注意图大小并不取决于频带数或卷积通道数,因此所提出的策略可以在不同大小的特征图之间进行 KD。实验结果表明,与传统的 BC-CMT-Tiny 模型相比,拥有 271.44K 模型参数的 BC-CMT-Tiny 模型在 VoxCeleb 1 测试集上的 1s 信号浮点运算和等错误率(EER)分别减少了 36.8% 和 9.3%。建议的 BC-CMT-Tiny 在 1 至 10 s 信号范围内的 CPU 和 GPU 运行时间分别为 29.07 至 146.32 ms 和 36.01 至 206.43 ms。随着注意力能力的提高,拟议的 KD 进一步将 EER 降低了 15.5%。
{"title":"Efficient Lightweight Speaker Verification With Broadcasting CNN-Transformer and Knowledge Distillation Training of Self-Attention Maps","authors":"Jeong-Hwan Choi;Joon-Young Yang;Joon-Hyuk Chang","doi":"10.1109/TASLP.2024.3463491","DOIUrl":"10.1109/TASLP.2024.3463491","url":null,"abstract":"Developing a lightweight speaker embedding extractor (SEE) is crucial for the practical implementation of automatic speaker verification (ASV) systems. To this end, we recently introduced \u0000<italic>broadcasting convolutional neural networks (CNNs)-meet-vision-Transformers</i>\u0000 (BC-CMT), a lightweight SEE that utilizes broadcasted residual learning (BRL) within the hybrid CNN-Transformer architecture to maintain a small number of model parameters. We proposed three BC-CMT-based SEE with three different sizes: BC-CMT-Tiny, -Small, and -Base. In this study, we extend our previously proposed BC-CMT by introducing an improved model architecture and a training strategy based on knowledge distillation (KD) using self-attention (SA) maps. First, to reduce the computational costs and latency of the BC-CMT, the two-dimensional (2D) SA operations in the BC-CMT, which calculate the SA maps in the frequency–time dimensions, are simplified to 1D SA operations that consider only temporal importance. Moreover, to enhance the SA capability of the BC-CMT, the group convolution layers in the SA block are adjusted to have smaller number of groups and are combined with the BRL operations. Second, to improve the training effectiveness of the modified BC-CMT-Tiny, the SA maps of a pretrained large BC-CMT-Base are used for the KD to guide those of a smaller BC-CMT-Tiny. Because the attention map sizes of the modified BC-CMT models do not depend on the number of frequency bins or convolution channels, the proposed strategy enables KD between feature maps with different sizes. The experimental results demonstrate that the proposed BC-CMT-Tiny model having 271.44K model parameters achieved 36.8% and 9.3% reduction in floating point operations on 1s signals and equal error rate (EER) on VoxCeleb 1 testset, respectively, compared to the conventional BC-CMT-Tiny. The CPU and GPU running time of the proposed BC-CMT-Tiny ranges of 1 to 10 s signals were 29.07 to 146.32 ms and 36.01 to 206.43 ms, respectively. The proposed KD further reduced the EER by 15.5% with improved attention capability.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4580-4595"},"PeriodicalIF":4.1,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE/ACM Transactions on Audio, Speech, and Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1