首页 > 最新文献

IEEE/ACM Transactions on Audio, Speech, and Language Processing最新文献

英文 中文
PHAIN: Audio Inpainting via Phase-Aware Optimization With Instantaneous Frequency PHAIN:通过瞬时频率相位感知优化进行音频绘制
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-18 DOI: 10.1109/TASLP.2024.3463415
Tomoro Tanaka;Kohei Yatabe;Yasuhiro Oikawa
Audio inpainting restores locally corrupted parts of digital audio signals. Sparsity-based methods achieve this by promoting sparsity in the time-frequency (T-F) domain, assuming short-time audio segments consist of a few sinusoids. However, such sparsity promotion reduces the magnitudes of the resulting waveforms; moreover, it often ignores the temporal connections of sinusoidal components. To address these problems, we propose a novel phase-aware audio inpainting method. Our method minimizes the time variations of a particular T-F representation calculated using the time derivative of the phase. This promotes sinusoidal components that coherently fit in the corrupted parts without directly suppressing the magnitudes. Both objective and subjective experiments confirmed the superiority of the proposed method compared with state-of-the-art methods.
音频绘制可恢复数字音频信号中局部损坏的部分。基于稀疏性的方法通过提高时频(T-F)域的稀疏性来实现这一目标,假定短时音频片段由几个正弦波组成。然而,这种稀疏性提升降低了所得到的波形的幅度,而且往往忽略了正弦成分的时间联系。为了解决这些问题,我们提出了一种新颖的相位感知音频绘制方法。我们的方法能最大限度地减少利用相位的时间导数计算出的特定 T-F 表示法的时间变化。这样就能在不直接抑制幅度的情况下,促进正弦波成分与损坏部分的一致性。客观和主观实验都证实,与最先进的方法相比,我们提出的方法更胜一筹。
{"title":"PHAIN: Audio Inpainting via Phase-Aware Optimization With Instantaneous Frequency","authors":"Tomoro Tanaka;Kohei Yatabe;Yasuhiro Oikawa","doi":"10.1109/TASLP.2024.3463415","DOIUrl":"10.1109/TASLP.2024.3463415","url":null,"abstract":"Audio inpainting restores locally corrupted parts of digital audio signals. Sparsity-based methods achieve this by promoting sparsity in the time-frequency (T-F) domain, assuming short-time audio segments consist of a few sinusoids. However, such sparsity promotion reduces the magnitudes of the resulting waveforms; moreover, it often ignores the temporal connections of sinusoidal components. To address these problems, we propose a novel phase-aware audio inpainting method. Our method minimizes the time variations of a particular T-F representation calculated using the time derivative of the phase. This promotes sinusoidal components that coherently fit in the corrupted parts without directly suppressing the magnitudes. Both objective and subjective experiments confirmed the superiority of the proposed method compared with state-of-the-art methods.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4471-4485"},"PeriodicalIF":4.1,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AudioNet: Supervised Deep Hashing for Retrieval of Similar Audio Events 音频网:有监督的深度散列检索相似音频事件
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-17 DOI: 10.1109/TASLP.2024.3446232
Sagar Dutta;Vipul Arora
This work presents a supervised deep hashing method for retrieving similar audio events. The proposed method, named AudioNet, is a deep-learning-based system for efficient hashing and retrieval of similar audio events using an audio example as a query. AudioNet achieves high retrieval performance on multiple standard datasets by generating binary hash codes for similar audio events, setting new benchmarks in the field, and highlighting its efficacy and effectiveness compare to other hashing methods. Through comprehensive experiments on standard datasets, our research represents a pioneering effort in evaluating the retrieval performance of similar audio events. A novel loss function is proposed which incorporates weighted contrastive and weighted pairwise loss along with hashcode balancing to improve the efficiency of audio event retrieval. The method adopts discrete gradient propagation, which allows gradients to be propagated through discrete variables during backpropagation. This enables the network to optimize the discrete hash codes using standard gradient-based optimization algorithms, which are typically used for continuous variables. The proposed method showcases promising retrieval performance, as evidenced by the experimental results, even when dealing with imbalanced datasets. The systematic analysis conducted in this study further supports the significant benefits of the proposed method in retrieval performance across multiple datasets. The findings presented in this work establish a baseline for future studies on the efficient retrieval of similar audio events using deep audio embeddings.
这项研究提出了一种用于检索相似音频事件的有监督深度散列方法。所提出的方法名为 AudioNet,是一种基于深度学习的系统,可使用音频示例作为查询,对类似音频事件进行高效散列和检索。通过为相似音频事件生成二进制散列码,AudioNet 在多个标准数据集上实现了较高的检索性能,在该领域树立了新的标杆,并凸显了其与其他散列方法相比的功效和有效性。通过对标准数据集的全面实验,我们的研究在评估相似音频事件的检索性能方面做出了开创性的努力。我们提出了一种新的损失函数,它结合了加权对比损失和加权成对损失以及哈希码平衡,以提高音频事件检索的效率。该方法采用离散梯度传播,允许在反向传播过程中通过离散变量传播梯度。这样,网络就能使用通常用于连续变量的基于梯度的标准优化算法来优化离散散列码。实验结果表明,即使在处理不平衡数据集时,所提出的方法也能显示出良好的检索性能。本研究中进行的系统分析进一步证实了所提方法在多个数据集检索性能方面的显著优势。本研究的发现为今后利用深度音频嵌入高效检索相似音频事件的研究奠定了基础。
{"title":"AudioNet: Supervised Deep Hashing for Retrieval of Similar Audio Events","authors":"Sagar Dutta;Vipul Arora","doi":"10.1109/TASLP.2024.3446232","DOIUrl":"10.1109/TASLP.2024.3446232","url":null,"abstract":"This work presents a supervised deep hashing method for retrieving similar audio events. The proposed method, named AudioNet, is a deep-learning-based system for efficient hashing and retrieval of similar audio events using an audio example as a query. AudioNet achieves high retrieval performance on multiple standard datasets by generating binary hash codes for similar audio events, setting new benchmarks in the field, and highlighting its efficacy and effectiveness compare to other hashing methods. Through comprehensive experiments on standard datasets, our research represents a pioneering effort in evaluating the retrieval performance of similar audio events. A novel loss function is proposed which incorporates weighted contrastive and weighted pairwise loss along with hashcode balancing to improve the efficiency of audio event retrieval. The method adopts discrete gradient propagation, which allows gradients to be propagated through discrete variables during backpropagation. This enables the network to optimize the discrete hash codes using standard gradient-based optimization algorithms, which are typically used for continuous variables. The proposed method showcases promising retrieval performance, as evidenced by the experimental results, even when dealing with imbalanced datasets. The systematic analysis conducted in this study further supports the significant benefits of the proposed method in retrieval performance across multiple datasets. The findings presented in this work establish a baseline for future studies on the efficient retrieval of similar audio events using deep audio embeddings.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4526-4536"},"PeriodicalIF":4.1,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Task Multi-Attention Transformer for Generative Named Entity Recognition 用于生成式命名实体识别的多任务多注意转换器
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-12 DOI: 10.1109/TASLP.2024.3458796
Ying Mo;Jiahao Liu;Hongyin Tang;Qifan Wang;Zenglin Xu;Jingang Wang;Xiaojun Quan;Wei Wu;Zhoujun Li
Most previous sequential labeling models are task-specific, while recent years have witnessed the rise of generative models due to the advantage of unifying all named entity recognition (NER) tasks into the encoder-decoder framework. Although achieving promising performance, our pilot studies demonstrate that existing generative models are ineffective at detecting entity boundaries and estimating entity types. In this paper, we propose a multi-task Transformer, which incorporates an entity boundary detection task into the named entity recognition task. More concretely, we achieve entity boundary detection by classifying the relations between tokens within the sentence. To improve the accuracy of entity-type mapping during decoding, we adopt an external knowledge base to calculate the prior entity-type distributions and then incorporate the information into the model via the self- and cross-attention mechanisms. We perform experiments on extensive NER benchmarks, including flat, nested, and discontinuous NER datasets involving long entities. It substantially increases nearly $+0.3 sim +1.5;{F_1}$ scores across a broad spectrum or performs closely to the best generative NER model. Experimental results show that our approach improves the performance of the generative NER model considerably.
以前的大多数顺序标注模型都是针对特定任务的,而近年来,由于将所有命名实体识别(NER)任务统一到编码器-解码器框架中的优势,生成模型开始兴起。尽管取得了可喜的成绩,但我们的试验研究表明,现有的生成模型在检测实体边界和估计实体类型方面效果不佳。在本文中,我们提出了一种多任务转换器,它将实体边界检测任务纳入命名实体识别任务中。更具体地说,我们通过对句子中标记之间的关系进行分类来实现实体边界检测。为了提高解码过程中实体类型映射的准确性,我们采用外部知识库来计算先验实体类型分布,然后通过自关注和交叉关注机制将这些信息纳入模型。我们在广泛的 NER 基准上进行了实验,包括涉及长实体的平面、嵌套和不连续 NER 数据集。它在广泛的范围内大幅提高了近 $+0.3 (sim +1.5);{F_1}$ 的得分,或与最佳生成式 NER 模型的表现接近。实验结果表明,我们的方法大大提高了生成式 NER 模型的性能。
{"title":"Multi-Task Multi-Attention Transformer for Generative Named Entity Recognition","authors":"Ying Mo;Jiahao Liu;Hongyin Tang;Qifan Wang;Zenglin Xu;Jingang Wang;Xiaojun Quan;Wei Wu;Zhoujun Li","doi":"10.1109/TASLP.2024.3458796","DOIUrl":"10.1109/TASLP.2024.3458796","url":null,"abstract":"Most previous sequential labeling models are task-specific, while recent years have witnessed the rise of generative models due to the advantage of unifying all named entity recognition (NER) tasks into the encoder-decoder framework. Although achieving promising performance, our pilot studies demonstrate that existing generative models are ineffective at detecting entity boundaries and estimating entity types. In this paper, we propose a multi-task Transformer, which incorporates an entity boundary detection task into the named entity recognition task. More concretely, we achieve entity boundary detection by classifying the relations between tokens within the sentence. To improve the accuracy of entity-type mapping during decoding, we adopt an external knowledge base to calculate the prior entity-type distributions and then incorporate the information into the model via the self- and cross-attention mechanisms. We perform experiments on extensive NER benchmarks, including flat, nested, and discontinuous NER datasets involving long entities. It substantially increases nearly \u0000<inline-formula><tex-math>$+0.3 sim +1.5;{F_1}$</tex-math></inline-formula>\u0000 scores across a broad spectrum or performs closely to the best generative NER model. Experimental results show that our approach improves the performance of the generative NER model considerably.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4171-4183"},"PeriodicalIF":4.1,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Filtered-X Quasi Affine Projection Algorithm for Active Noise Control Networks 用于主动噪声控制网络的滤波-X 准仿射投影算法
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-12 DOI: 10.1109/TASLP.2024.3458806
Miguel Ferrer;María de Diego;Alberto Gonzalez
The affine projection (AP) algorithm enhances the performance of gradient-based adaptive algorithms when dealing with colored reference signals, which is typically the case with filtered-X type algorithms. This enhancement is achieved by using various delayed versions of the reference signal data vector, which are appropriately orthogonalized and normalized to optimize convergence performance. The number of these vectors, known as the projection order of the AP, increases the computational requirements, mainly due to the calculation of a matrix inversion whose dimensions are proportional to this projection order. When used in distributed systems, the AP algorithm typically requires each acoustic node in the system to compute the complete matrix inversion, even though they only need a specific set of data (a subblock) from it. This means that the AP does not offer much advantage in terms of computational savings when used in distributed collaborative networks. To address this issue, an approximate version of the filtered-X affine projection (FXAP) algorithm is introduced in this work. This approximate version avoids the matrix inversion computation in each iteration using a precalculated inverse matrix. This strategy provides computational savings and enables easy distribution of the algorithm. Additionally, a variable step-size approach is proposed to mitigate the deviation caused by a precalculated matrix, which provides good performance, high robustness, and cost-effective distribution.
仿射投影(AP)算法可提高基于梯度的自适应算法在处理彩色参考信号时的性能,而滤波 X 型算法通常就是这种情况。这种增强是通过使用参考信号数据向量的各种延迟版本来实现的,这些延迟版本经过适当的正交化和归一化处理,以优化收敛性能。这些向量的数量,即 AP 的投影阶数,会增加计算需求,主要是因为需要计算矩阵反演,而矩阵反演的维数与投影阶数成正比。在分布式系统中使用时,AP 算法通常要求系统中的每个声学节点计算完整的矩阵反演,即使它们只需要其中的特定数据集(子块)。这意味着,在分布式协作网络中使用 AP 算法时,在节省计算量方面并没有太大优势。为了解决这个问题,本文引入了近似版本的过滤-X 仿射投影算法(FXAP)。该近似版本使用预先计算好的逆矩阵,避免了每次迭代中的矩阵反转计算。这一策略不仅节省了计算量,而且便于算法的推广。此外,还提出了一种步长可变的方法,以减轻预计算矩阵造成的偏差,从而提供良好的性能、高鲁棒性和经济高效的分布。
{"title":"Filtered-X Quasi Affine Projection Algorithm for Active Noise Control Networks","authors":"Miguel Ferrer;María de Diego;Alberto Gonzalez","doi":"10.1109/TASLP.2024.3458806","DOIUrl":"10.1109/TASLP.2024.3458806","url":null,"abstract":"The affine projection (AP) algorithm enhances the performance of gradient-based adaptive algorithms when dealing with colored reference signals, which is typically the case with filtered-X type algorithms. This enhancement is achieved by using various delayed versions of the reference signal data vector, which are appropriately orthogonalized and normalized to optimize convergence performance. The number of these vectors, known as the projection order of the AP, increases the computational requirements, mainly due to the calculation of a matrix inversion whose dimensions are proportional to this projection order. When used in distributed systems, the AP algorithm typically requires each acoustic node in the system to compute the complete matrix inversion, even though they only need a specific set of data (a subblock) from it. This means that the AP does not offer much advantage in terms of computational savings when used in distributed collaborative networks. To address this issue, an approximate version of the filtered-X affine projection (FXAP) algorithm is introduced in this work. This approximate version avoids the matrix inversion computation in each iteration using a precalculated inverse matrix. This strategy provides computational savings and enables easy distribution of the algorithm. Additionally, a variable step-size approach is proposed to mitigate the deviation caused by a precalculated matrix, which provides good performance, high robustness, and cost-effective distribution.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4237-4252"},"PeriodicalIF":4.1,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10679717","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Kronecker Product Beamforming for Large-Scale Microphone Arrays 用于大规模麦克风阵列的深克罗内克乘积波束成形
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-12 DOI: 10.1109/TASLP.2024.3459430
Weixin Meng;Xiaoyu Li;Andong Li;Xiaoxue Luo;Shefeng Yan;Xiaodong Li;Chengshi Zheng
Although deep learning based beamformers have achieved promising performance using small microphone arrays, they suffer from performance degradation in very challenging environments, such as extremely low Signal-to-Noise Ratio (SNR) environments, e.g., SNR $le$−10 dB. A large-scale microphone array with dozens or hundreds of microphones can improve the performance of beamformers in these challenging scenarios because of its high spatial resolution. While a dramatic increase in the number of microphones leads to feature redundancy, causing difficulties in feature extraction and network training. As an attempt to improve the performance of deep beamformers for speech extraction in very challenging scenarios, this paper proposes a novel all neural Kronecker product beamforming denoted by ANKP-BF for large-scale microphone arrays by taking the following two aspects into account. Firstly, a larger microphone array can provide higher performance of spatial filtering when compared with a small microphone array, and deep neural networks are introduced for their powerful non-linear modeling capability in the speech extraction task. Secondly, the feature redundancy problem is solved by introducing the Kronecker product rule to decompose the original one high-dimension weight vector into the Kronecker product of two much lower-dimensional weight vectors. The proposed ANKP-BF is designed to operate in an end-to-end manner. Extensive experiments are conducted on simulated large-scale microphone-array signals using the DNS-Challenge corpus and WSJ0-SI84 corpus, and the real recordings in a semi-anechoic room and outdoor scenes are also used to evaluate and compare the performance of different methods. Quantitative results demonstrate that the proposed method outperforms existing advanced baselines in terms of multiple objective metrics, especially in very low SNR environments.
尽管基于深度学习的波束成形器在使用小型麦克风阵列时取得了可喜的性能,但在极具挑战性的环境中,例如信噪比(SNR)极低的环境中(例如,SNR $le$-10 dB),它们的性能会下降。由数十个或数百个麦克风组成的大规模麦克风阵列具有很高的空间分辨率,因此可以提高波束成形器在这些具有挑战性的场景中的性能。但麦克风数量的急剧增加会导致特征冗余,给特征提取和网络训练带来困难。为了提高深度波束成形器在极具挑战性的场景中进行语音提取的性能,本文从以下两个方面入手,提出了一种适用于大规模麦克风阵列的新型全神经克朗克积波束成形方法(ANKP-BF)。首先,与小型麦克风阵列相比,大型麦克风阵列能提供更高的空间滤波性能,而深度神经网络在语音提取任务中具有强大的非线性建模能力,因此本文引入了深度神经网络。其次,通过引入 Kronecker 乘积规则,将原始的一个高维权重向量分解为两个低得多的权重向量的 Kronecker 乘积,解决了特征冗余问题。所提出的 ANKP-BF 设计为端到端方式。利用 DNS-Challenge 语料库和 WSJ0-SI84 语料库对模拟的大规模麦克风阵列信号进行了广泛的实验,同时还利用半消声室和室外场景中的真实录音来评估和比较不同方法的性能。定量结果表明,所提出的方法在多个客观指标上都优于现有的先进基线,尤其是在信噪比非常低的环境中。
{"title":"Deep Kronecker Product Beamforming for Large-Scale Microphone Arrays","authors":"Weixin Meng;Xiaoyu Li;Andong Li;Xiaoxue Luo;Shefeng Yan;Xiaodong Li;Chengshi Zheng","doi":"10.1109/TASLP.2024.3459430","DOIUrl":"10.1109/TASLP.2024.3459430","url":null,"abstract":"Although deep learning based beamformers have achieved promising performance using small microphone arrays, they suffer from performance degradation in very challenging environments, such as extremely low Signal-to-Noise Ratio (SNR) environments, e.g., SNR \u0000<inline-formula><tex-math>$le$</tex-math></inline-formula>\u0000−10 dB. A large-scale microphone array with dozens or hundreds of microphones can improve the performance of beamformers in these challenging scenarios because of its high spatial resolution. While a dramatic increase in the number of microphones leads to feature redundancy, causing difficulties in feature extraction and network training. As an attempt to improve the performance of deep beamformers for speech extraction in very challenging scenarios, this paper proposes a novel all neural Kronecker product beamforming denoted by ANKP-BF for large-scale microphone arrays by taking the following two aspects into account. Firstly, a larger microphone array can provide higher performance of spatial filtering when compared with a small microphone array, and deep neural networks are introduced for their powerful non-linear modeling capability in the speech extraction task. Secondly, the feature redundancy problem is solved by introducing the Kronecker product rule to decompose the original one high-dimension weight vector into the Kronecker product of two much lower-dimensional weight vectors. The proposed ANKP-BF is designed to operate in an end-to-end manner. Extensive experiments are conducted on simulated large-scale microphone-array signals using the DNS-Challenge corpus and WSJ0-SI84 corpus, and the real recordings in a semi-anechoic room and outdoor scenes are also used to evaluate and compare the performance of different methods. Quantitative results demonstrate that the proposed method outperforms existing advanced baselines in terms of multiple objective metrics, especially in very low SNR environments.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4537-4553"},"PeriodicalIF":4.1,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Non-Autoregressive Translation Quality With Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC 利用预训练语言模型、嵌入式蒸馏和上采样策略提高 CTC 的非自回归翻译质量
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-12 DOI: 10.1109/TASLP.2024.3451977
Shen-sian Syu;Juncheng Xie;Hung-yi Lee
Non-autoregressive approaches, especially those that generate output in a one-pass forward manner, have shown great potential in improving the inference speed of translation models. However, these approaches often suffer from a significant drop in translation quality compared to autoregressive models (AT). To tackle this challenge, this paper introduces a series of innovative techniques to enhance the translation quality of non-autoregressive neural machine translation (NAT) models while still maintaining a substantial acceleration in inference speed. Specifically, we propose a method called CTCPMLM, which involves fine-tuning Pretrained Multilingual Language Models (PMLMs) with the Connectionist Temporal Classification (CTC) loss to effectively train NAT models. Additionally, we adopt the MASK insertion scheme instead of token duplication for up-sampling and present an embedding distillation method to further enhance the performance of NAT models. In our experiments, CTCPMLM surpasses the performance of the baseline autoregressive model (Transformer base) on various datasets, including WMT'14 DE $leftrightarrow$ EN, WMT'16 RO $leftrightarrow$ EN, and IWSLT'14 DE $leftrightarrow$ EN. Moreover, CTCPMLM represents the current state-of-the-art among NAT models. Notably, our model achieves superior results compared to the baseline autoregressive model on the IWSLT'14 En $leftrightarrow$ De and WMT'16 En $leftrightarrow$ Ro datasets, even without using distillation data during training. Particularly, on the IWSLT'14 DE $rightarrow$ EN dataset, our model achieves an impressive BLEU score of 39.93, surpassing AT models and establishing a new state-of-the-art. Additionally, our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.
非自回归方法,特别是那些以单程前向方式生成输出的方法,在提高翻译模型的推理速度方面显示出巨大的潜力。然而,与自回归模型(AT)相比,这些方法的翻译质量往往会大幅下降。为了应对这一挑战,本文介绍了一系列创新技术,以提高非自回归神经机器翻译(NAT)模型的翻译质量,同时仍能保持推理速度的大幅提升。具体来说,我们提出了一种名为 CTCPMLM 的方法,该方法涉及利用连接时序分类(CTC)损失对预处理多语言语言模型(PMLM)进行微调,从而有效地训练 NAT 模型。此外,我们还采用 MASK 插入方案代替标记复制进行上采样,并提出了一种嵌入蒸馏方法,以进一步提高 NAT 模型的性能。在我们的实验中,CTCPMLM 在各种数据集(包括 WMT'14 DE $leftrightarrow$ EN、WMT'16 RO $leftrightarrow$ EN 和 IWSLT'14 DE $leftrightarrow$ EN)上的性能都超过了基准自回归模型(Transformer base)。此外,CTCPMLM 代表了当前 NAT 模型的最先进水平。值得注意的是,与基线自回归模型相比,我们的模型在 IWSLT'14 En $leftrightarrow$ De 和 WMT'16 En $leftrightarrow$ Ro 数据集上取得了更好的结果,即使在训练过程中不使用蒸馏数据。特别是在IWSLT'14 DE $rightarrow$ EN数据集上,我们的模型取得了令人印象深刻的BLEU分数39.93,超过了AT模型,建立了新的先进水平。此外,与自回归模型相比,我们的模型速度显著提高了 16.35 倍。
{"title":"Improving Non-Autoregressive Translation Quality With Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC","authors":"Shen-sian Syu;Juncheng Xie;Hung-yi Lee","doi":"10.1109/TASLP.2024.3451977","DOIUrl":"10.1109/TASLP.2024.3451977","url":null,"abstract":"Non-autoregressive approaches, especially those that generate output in a one-pass forward manner, have shown great potential in improving the inference speed of translation models. However, these approaches often suffer from a significant drop in translation quality compared to autoregressive models (AT). To tackle this challenge, this paper introduces a series of innovative techniques to enhance the translation quality of non-autoregressive neural machine translation (NAT) models while still maintaining a substantial acceleration in inference speed. Specifically, we propose a method called CTCPMLM, which involves fine-tuning Pretrained Multilingual Language Models (PMLMs) with the Connectionist Temporal Classification (CTC) loss to effectively train NAT models. Additionally, we adopt the MASK insertion scheme instead of token duplication for up-sampling and present an embedding distillation method to further enhance the performance of NAT models. In our experiments, CTCPMLM surpasses the performance of the baseline autoregressive model (Transformer \u0000<italic>base</i>\u0000) on various datasets, including WMT'14 DE \u0000<inline-formula><tex-math>$leftrightarrow$</tex-math></inline-formula>\u0000 EN, WMT'16 RO \u0000<inline-formula><tex-math>$leftrightarrow$</tex-math></inline-formula>\u0000 EN, and IWSLT'14 DE \u0000<inline-formula><tex-math>$leftrightarrow$</tex-math></inline-formula>\u0000 EN. Moreover, CTCPMLM represents the current state-of-the-art among NAT models. Notably, our model achieves superior results compared to the baseline autoregressive model on the IWSLT'14 En \u0000<inline-formula><tex-math>$leftrightarrow$</tex-math></inline-formula>\u0000 De and WMT'16 En \u0000<inline-formula><tex-math>$leftrightarrow$</tex-math></inline-formula>\u0000 Ro datasets, even without using distillation data during training. Particularly, on the IWSLT'14 DE \u0000<inline-formula><tex-math>$rightarrow$</tex-math></inline-formula>\u0000 EN dataset, our model achieves an impressive BLEU score of 39.93, surpassing AT models and establishing a new state-of-the-art. Additionally, our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4121-4133"},"PeriodicalIF":4.1,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TriSAT: Trimodal Representation Learning for Multimodal Sentiment Analysis TriSAT:多模态情感分析的三模态表征学习
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-11 DOI: 10.1109/TASLP.2024.3458812
Ruohong Huan;Guowei Zhong;Peng Chen;Ronghua Liang
Transformer-based multimodal sentiment analysis frameworks commonly facilitate cross-modal interactions between two modalities through the attention mechanism. However, such interactions prove inadequate when dealing with three or more modalities, leading to increased computational complexity and network redundancy. To address this challenge, this paper introduces a novel framework, Trimodal representations for Sentiment Analysis from Transformers (TriSAT), tailored for multimodal sentiment analysis. TriSAT incorporates a trimodal transformer featuring a module called Trimodal Multi-Head Attention (TMHA). TMHA considers language as the primary modality, combines information from language, video, and audio using a single computation, and analyzes sentiment from a trimodal perspective. This approach significantly reduces the computational complexity while delivering high performance. Moreover, we propose Attraction-Repulsion (AR) loss and Trimodal Supervised Contrastive (TSC) loss to further enhance sentiment analysis performance. We conduct experiments on three public datasets to evaluate TriSAT's performance, which consistently demonstrates its competitiveness compared to state-of-the-art approaches.
基于变换器的多模态情感分析框架通常通过注意力机制促进两种模态之间的跨模态交互。然而,在处理三种或更多模态时,这种交互被证明是不够的,从而导致计算复杂性和网络冗余增加。为了应对这一挑战,本文介绍了一个为多模态情感分析量身定制的新框架--"来自变形器的情感分析三模态表征"(TriSAT)。TriSAT 包含一个三模态变换器,其特色模块是三模态多头注意力(TMHA)。TMHA 将语言视为主要模态,通过一次计算将语言、视频和音频信息结合起来,并从三模态角度进行情感分析。这种方法在提供高性能的同时大大降低了计算复杂度。此外,我们还提出了吸引-排斥(AR)损失和三模态监督对比(TSC)损失,以进一步提高情感分析性能。我们在三个公开数据集上进行了实验,以评估 TriSAT 的性能,结果表明 TriSAT 与最先进的方法相比始终具有竞争力。
{"title":"TriSAT: Trimodal Representation Learning for Multimodal Sentiment Analysis","authors":"Ruohong Huan;Guowei Zhong;Peng Chen;Ronghua Liang","doi":"10.1109/TASLP.2024.3458812","DOIUrl":"10.1109/TASLP.2024.3458812","url":null,"abstract":"Transformer-based multimodal sentiment analysis frameworks commonly facilitate cross-modal interactions between two modalities through the attention mechanism. However, such interactions prove inadequate when dealing with three or more modalities, leading to increased computational complexity and network redundancy. To address this challenge, this paper introduces a novel framework, Trimodal representations for Sentiment Analysis from Transformers (TriSAT), tailored for multimodal sentiment analysis. TriSAT incorporates a trimodal transformer featuring a module called Trimodal Multi-Head Attention (TMHA). TMHA considers language as the primary modality, combines information from language, video, and audio using a single computation, and analyzes sentiment from a trimodal perspective. This approach significantly reduces the computational complexity while delivering high performance. Moreover, we propose Attraction-Repulsion (AR) loss and Trimodal Supervised Contrastive (TSC) loss to further enhance sentiment analysis performance. We conduct experiments on three public datasets to evaluate TriSAT's performance, which consistently demonstrates its competitiveness compared to state-of-the-art approaches.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4105-4120"},"PeriodicalIF":4.1,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spherically Steerable Vector Differential Microphone Arrays 球面可转向矢量差分麦克风阵列
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-10 DOI: 10.1109/TASLP.2024.3458799
Hüseyin Hacıhabiboğlu
Differential microphone arrays (DMAs) use multiple omnidirectional microphones for synthesising higher-order microphone directivity patterns. In their most basic form, they can be used to obtain fixed-directivity or horizontally steerable beamformers that can satisfy certain constraints. We propose a vector differential microphone array (VDMA) which is frequency- and direction-invariantly steerable in three dimensions. The proposed design comprises pressure and particle velocity sensors positioned on a circular constellation in a plane and allows extracting the third-order spherical harmonic decomposition of the sound field. This decomposition can then be used to obtain spherically direction-invariant steered beams. Synthesis of a maximum directivity factor (MaxDF) directivity pattern is demonstrated. A closed-form expression for the proposed array's white noise gain (WNG) is derived. The robustness of the proposed design to noise is analysed.
差分传声器阵列(DMA)使用多个全向传声器合成高阶传声器指向性模式。在最基本的形式中,它们可用于获得固定指向性或水平转向波束成形器,并能满足某些约束条件。我们提出了一种矢量差分传声器阵列(VDMA),它在三个维度上具有频率和方向可变的转向性。所提出的设计包括压力和粒子速度传感器,它们被放置在平面上的一个圆形星座上,可以提取声场的三阶球形谐波分解。然后,可以利用这种分解来获得球面方向不变的转向波束。演示了最大指向性系数(MaxDF)指向性模式的合成。得出了拟议阵列白噪声增益(WNG)的闭式表达式。分析了拟议设计对噪声的鲁棒性。
{"title":"Spherically Steerable Vector Differential Microphone Arrays","authors":"Hüseyin Hacıhabiboğlu","doi":"10.1109/TASLP.2024.3458799","DOIUrl":"10.1109/TASLP.2024.3458799","url":null,"abstract":"Differential microphone arrays (DMAs) use multiple omnidirectional microphones for synthesising higher-order microphone directivity patterns. In their most basic form, they can be used to obtain fixed-directivity or horizontally steerable beamformers that can satisfy certain constraints. We propose a vector differential microphone array (VDMA) which is frequency- and direction-invariantly steerable in three dimensions. The proposed design comprises pressure and particle velocity sensors positioned on a circular constellation in a plane and allows extracting the third-order spherical harmonic decomposition of the sound field. This decomposition can then be used to obtain spherically direction-invariant steered beams. Synthesis of a maximum directivity factor (MaxDF) directivity pattern is demonstrated. A closed-form expression for the proposed array's white noise gain (WNG) is derived. The robustness of the proposed design to noise is analysed.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4342-4354"},"PeriodicalIF":4.1,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Supervised Learning of Spatial Acoustic Representation With Cross-Channel Signal Reconstruction and Multi-Channel Conformer 利用跨信道信号重构和多信道适形器进行空间声学表征的自我监督学习
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-10 DOI: 10.1109/TASLP.2024.3458811
Bing Yang;Xiaofei Li
Supervised learning methods have shown effectiveness in estimating spatial acoustic parameters such as time difference of arrival, direct-to-reverberant ratio and reverberation time. However, they still suffer from the simulation-to-reality generalization problem due to the mismatch between simulated and real-world acoustic characteristics and the deficiency of annotated real-world data. To this end, this work proposes a self-supervised method that takes full advantage of unlabeled data for spatial acoustic parameter estimation. First, a new pretext task, i.e. cross-channel signal reconstruction (CCSR), is designed to learn a universal spatial acoustic representation from unlabeled multi-channel microphone signals. We mask partial signals of one channel and ask the model to reconstruct them, which makes it possible to learn spatial acoustic information from unmasked signals and extract source information from the other microphone channel. An encoder-decoder structure is used to disentangle the two kinds of information. By fine-tuning the pre-trained spatial encoder with a small annotated dataset, this encoder can be used to estimate spatial acoustic parameters. Second, a novel multi-channel audio Conformer (MC-Conformer) is adopted as the encoder model architecture, which is suitable for both the pretext and downstream tasks. It is carefully designed to be able to capture the local and global characteristics of spatial acoustics exhibited in the time-frequency domain. Experimental results of five acoustic parameter estimation tasks on both simulated and real-world data show the effectiveness of the proposed method. To the best of our knowledge, this is the first self-supervised learning method in the field of spatial acoustic representation learning and multi-channel audio signal processing.
有监督的学习方法在估计空间声学参数(如到达时间差、直接混响比和混响时间)方面显示出了有效性。然而,由于模拟声学特征与真实世界声学特征之间的不匹配以及注释真实世界数据的不足,这些方法仍然存在模拟到现实的泛化问题。为此,本研究提出了一种自监督方法,充分利用无标注数据进行空间声学参数估计。首先,我们设计了一个新的前置任务,即跨信道信号重建(CCSR),以从未标明的多信道麦克风信号中学习通用的空间声学表示。我们屏蔽一个信道的部分信号,并要求模型对其进行重建,这样就可以从未获屏蔽的信号中学习空间声学信息,并从另一个麦克风信道中提取声源信息。编码器-解码器结构用于分离这两种信息。通过使用小型注释数据集对预先训练的空间编码器进行微调,该编码器可用于估算空间声学参数。其次,编码器模型结构采用了新颖的多通道音频变换器(MC-Conformer),它既适用于前置任务,也适用于下游任务。它经过精心设计,能够捕捉时频域空间声学的局部和全局特征。在模拟和真实世界数据上进行的五项声学参数估计任务的实验结果表明了所提方法的有效性。据我们所知,这是空间声学表示学习和多通道音频信号处理领域的第一个自监督学习方法。
{"title":"Self-Supervised Learning of Spatial Acoustic Representation With Cross-Channel Signal Reconstruction and Multi-Channel Conformer","authors":"Bing Yang;Xiaofei Li","doi":"10.1109/TASLP.2024.3458811","DOIUrl":"10.1109/TASLP.2024.3458811","url":null,"abstract":"Supervised learning methods have shown effectiveness in estimating spatial acoustic parameters such as time difference of arrival, direct-to-reverberant ratio and reverberation time. However, they still suffer from the simulation-to-reality generalization problem due to the mismatch between simulated and real-world acoustic characteristics and the deficiency of annotated real-world data. To this end, this work proposes a self-supervised method that takes full advantage of unlabeled data for spatial acoustic parameter estimation. First, a new pretext task, i.e. cross-channel signal reconstruction (CCSR), is designed to learn a universal spatial acoustic representation from unlabeled multi-channel microphone signals. We mask partial signals of one channel and ask the model to reconstruct them, which makes it possible to learn spatial acoustic information from unmasked signals and extract source information from the other microphone channel. An encoder-decoder structure is used to disentangle the two kinds of information. By fine-tuning the pre-trained spatial encoder with a small annotated dataset, this encoder can be used to estimate spatial acoustic parameters. Second, a novel multi-channel audio Conformer (MC-Conformer) is adopted as the encoder model architecture, which is suitable for both the pretext and downstream tasks. It is carefully designed to be able to capture the local and global characteristics of spatial acoustics exhibited in the time-frequency domain. Experimental results of five acoustic parameter estimation tasks on both simulated and real-world data show the effectiveness of the proposed method. To the best of our knowledge, this is the first self-supervised learning method in the field of spatial acoustic representation learning and multi-channel audio signal processing.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4211-4225"},"PeriodicalIF":4.1,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations ZMM-TTS:以自监督离散语音表征为条件的零镜头多语言和多发言人语音合成
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-09-06 DOI: 10.1109/TASLP.2024.3451951
Cheng Gong;Xin Wang;Erica Cooper;Dan Wells;Longbiao Wang;Jianwu Dang;Korin Richmond;Junichi Yamagishi
Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker, single-language synthesis. Multilingual TTS systems are limited to resource-rich languages due to the lack of large paired text and studio-quality audio data. TTS systems are typically built using a single speaker's voice, but there is growing interest in developing systems that can synthesize voices for new speakers using only a few seconds of their speech. This paper presents ZMM-TTS, a multilingual and multispeaker framework utilizing quantized latent speech representations from a large-scale, pre-trained, self-supervised model. Our paper combines text-based and speech-based self-supervised learning models for multilingual speech synthesis. Our proposed model has zero-shot generalization ability not only for unseen speakers but also for unseen languages. We have conducted comprehensive subjective and objective evaluations through a series of experiments. Our model has proven effective in terms of speech naturalness and similarity for both seen and unseen speakers in six high-resource languages. We also tested the efficiency of our method on two hypothetically low-resource languages. The results are promising, indicating that our proposed approach can synthesize audio that is intelligible and has a high degree of similarity to the target speaker's voice, even without any training data for the new, unseen language.
神经文本到语音(TTS)已在单人、单语种合成中实现了类人合成语音。由于缺乏大量配对文本和录音室质量的音频数据,多语言 TTS 系统仅限于资源丰富的语言。TTS 系统通常使用单个说话者的语音来构建,但人们对开发只使用几秒钟说话者语音就能为新说话者合成语音的系统越来越感兴趣。本文介绍了 ZMM-TTS,这是一种多语言和多发言人框架,它利用来自大规模预训练自监督模型的量化潜在语音表示。我们的论文结合了基于文本和基于语音的自监督学习模型,用于多语言语音合成。我们提出的模型不仅对未见过的说话人,而且对未见过的语言都具有零点泛化能力。我们通过一系列实验进行了全面的主观和客观评估。事实证明,我们的模型对六种高资源语言中见过和没见过的说话人都能有效地提高语音自然度和相似度。我们还在两种假设的低资源语言上测试了我们方法的效率。结果很有希望,表明我们提出的方法即使在没有任何新的、未见过的语言的训练数据的情况下,也能合成可理解的、与目标说话人的声音高度相似的音频。
{"title":"ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations","authors":"Cheng Gong;Xin Wang;Erica Cooper;Dan Wells;Longbiao Wang;Jianwu Dang;Korin Richmond;Junichi Yamagishi","doi":"10.1109/TASLP.2024.3451951","DOIUrl":"10.1109/TASLP.2024.3451951","url":null,"abstract":"Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker, single-language synthesis. Multilingual TTS systems are limited to resource-rich languages due to the lack of large paired text and studio-quality audio data. TTS systems are typically built using a single speaker's voice, but there is growing interest in developing systems that can synthesize voices for new speakers using only a few seconds of their speech. This paper presents ZMM-TTS, a multilingual and multispeaker framework utilizing quantized latent speech representations from a large-scale, pre-trained, self-supervised model. Our paper combines text-based and speech-based self-supervised learning models for multilingual speech synthesis. Our proposed model has zero-shot generalization ability not only for unseen speakers but also for unseen languages. We have conducted comprehensive subjective and objective evaluations through a series of experiments. Our model has proven effective in terms of speech naturalness and similarity for both seen and unseen speakers in six high-resource languages. We also tested the efficiency of our method on two hypothetically low-resource languages. The results are promising, indicating that our proposed approach can synthesize audio that is intelligible and has a high degree of similarity to the target speaker's voice, even without any training data for the new, unseen language.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4036-4051"},"PeriodicalIF":4.1,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10669054","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE/ACM Transactions on Audio, Speech, and Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1