首页 > 最新文献

Speech Communication最新文献

英文 中文
Robustness of emotion recognition in dialogue systems: A study on third-party API integrations and black-box attacks 对话系统中情感识别的鲁棒性:第三方API集成和黑盒攻击的研究
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-10-13 DOI: 10.1016/j.specom.2025.103316
Fatma Gumus , M. Fatih Amasyali
There is an intricate interplay between third-party AI application programming interfaces and adversarial machine learning. The investigation centers on vulnerabilities inherent in AI models utilizing multiple black-box APIs, with a particular emphasis on their susceptibility to attacks in the domains of speech and text recognition. Our exploration spans a spectrum of attack strategies, encompassing targeted, indiscriminate, and adaptive targeting approaches, each carefully designed to exploit unique facets of multi-modal inputs. The results underscore the intricate balance between attack success, average target class confidence, and the density of swaps and queries. Remarkably, targeted attacks exhibit an average success rate of 76%, while adaptive targeting achieves an even higher rate of 88%. Conversely, indiscriminate attacks attain an intermediate success rate of 73%, highlighting their potency even in the absence of strategic tailoring. Moreover, our strategies’ efficiency is evaluated through a resource utilization lens. Our findings reveal adaptive targeting as the most efficient approach, with an average of 2 word swaps and 140 queries per attack instance. In contrast, indiscriminate targeting requires an average of 2 word swaps and 150 queries per instance.
第三方AI应用程序编程接口和对抗性机器学习之间存在复杂的相互作用。调查集中在利用多个黑箱api的人工智能模型固有的漏洞上,特别强调它们在语音和文本识别领域对攻击的易感性。我们的研究涵盖了一系列攻击策略,包括有针对性的、不分青红皂白的和自适应的攻击方法,每种方法都经过精心设计,以利用多模态输入的独特方面。结果强调了攻击成功、平均目标类置信度以及交换和查询密度之间的复杂平衡。值得注意的是,目标攻击的平均成功率为76%,而自适应目标攻击的成功率甚至更高,达到88%。相反,不分青红皂白的攻击达到73%的中等成功率,即使在没有战略调整的情况下也突出了它们的效力。此外,我们的战略效率是通过资源利用的角度来评估的。我们的研究结果表明,自适应目标是最有效的方法,每个攻击实例平均有2个单词交换和140个查询。相比之下,不加区分的目标需要每个实例平均进行2次单词交换和150次查询。
{"title":"Robustness of emotion recognition in dialogue systems: A study on third-party API integrations and black-box attacks","authors":"Fatma Gumus ,&nbsp;M. Fatih Amasyali","doi":"10.1016/j.specom.2025.103316","DOIUrl":"10.1016/j.specom.2025.103316","url":null,"abstract":"<div><div>There is an intricate interplay between third-party AI application programming interfaces and adversarial machine learning. The investigation centers on vulnerabilities inherent in AI models utilizing multiple black-box APIs, with a particular emphasis on their susceptibility to attacks in the domains of speech and text recognition. Our exploration spans a spectrum of attack strategies, encompassing targeted, indiscriminate, and adaptive targeting approaches, each carefully designed to exploit unique facets of multi-modal inputs. The results underscore the intricate balance between attack success, average target class confidence, and the density of swaps and queries. Remarkably, targeted attacks exhibit an average success rate of 76%, while adaptive targeting achieves an even higher rate of 88%. Conversely, indiscriminate attacks attain an intermediate success rate of 73%, highlighting their potency even in the absence of strategic tailoring. Moreover, our strategies’ efficiency is evaluated through a resource utilization lens. Our findings reveal adaptive targeting as the most efficient approach, with an average of 2 word swaps and 140 queries per attack instance. In contrast, indiscriminate targeting requires an average of 2 word swaps and 150 queries per instance.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103316"},"PeriodicalIF":3.0,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145325872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An acoustic analysis of the nasal electrolarynx in healthy participants 健康受试者鼻电喉的声学分析
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-10-01 DOI: 10.1016/j.specom.2025.103315
Ching-Hung Lai , Shu-Wei Tsai , Chenhao Chiu , Yung-An Tsou , Ting-Shou Chang , David Shang-Yu Hung , Miyuki Hsing-Chun Hsieh , I-Pei Lee , Tammy Tsai
The nasal electrolarynx (NEL) is an innovative device that assists patients without vocal folds or under endotracheal intubation in producing speech sounds. The NEL has a different path for acoustic wave transmission to the traditional electrolarynx that starts from the nostril, passes through the nasal cavity, velopharyngeal port, and oral cavity, and exits the lips. There are several advantages to the NEL, including being non-handheld and not requiring a specific “sweet spot.” However, little is known about the acoustic characteristics of the NEL. This study investigated the acoustic characteristics of the NEL compared to normal speech using ten participants involved in two vowel production sessions. Compared to normal speech, NEL speech had low-frequency deficits in the linear predictive coding spectrum, higher first and second formants, decreased amplitude of the first formant, and increased amplitude of the nasal pole. The results identify the general acoustic features of the NEL, which are discussed using a tube model of the vocal tract and perturbation theory. Understanding the acoustic properties of NEL will help refine the acoustic source and speech recognition in future studies.
鼻电喉(NEL)是一种创新的设备,可以帮助没有声带或气管插管的患者发出语音。NEL的声波传播路径与传统的电喉不同,即从鼻孔出发,经过鼻腔、腭咽口、口腔,最后离开嘴唇。NEL有几个优点,包括非手持和不需要特定的“最佳点”。然而,人们对NEL的声学特性知之甚少。本研究通过10名参与者参与两个元音生成会话来研究NEL与正常语音的声学特征。与正常语音相比,NEL语音在线性预测编码谱上存在低频缺陷,第一和第二共振峰较高,第一共振峰幅度减小,鼻极幅度增大。结果确定了NEL的一般声学特征,并使用声道管模型和微扰理论进行了讨论。了解NEL的声学特性将有助于在未来的研究中完善声源和语音识别。
{"title":"An acoustic analysis of the nasal electrolarynx in healthy participants","authors":"Ching-Hung Lai ,&nbsp;Shu-Wei Tsai ,&nbsp;Chenhao Chiu ,&nbsp;Yung-An Tsou ,&nbsp;Ting-Shou Chang ,&nbsp;David Shang-Yu Hung ,&nbsp;Miyuki Hsing-Chun Hsieh ,&nbsp;I-Pei Lee ,&nbsp;Tammy Tsai","doi":"10.1016/j.specom.2025.103315","DOIUrl":"10.1016/j.specom.2025.103315","url":null,"abstract":"<div><div>The nasal electrolarynx (NEL) is an innovative device that assists patients without vocal folds or under endotracheal intubation in producing speech sounds. The NEL has a different path for acoustic wave transmission to the traditional electrolarynx that starts from the nostril, passes through the nasal cavity, velopharyngeal port, and oral cavity, and exits the lips. There are several advantages to the NEL, including being non-handheld and not requiring a specific “sweet spot.” However, little is known about the acoustic characteristics of the NEL. This study investigated the acoustic characteristics of the NEL compared to normal speech using ten participants involved in two vowel production sessions. Compared to normal speech, NEL speech had low-frequency deficits in the linear predictive coding spectrum, higher first and second formants, decreased amplitude of the first formant, and increased amplitude of the nasal pole. The results identify the general acoustic features of the NEL, which are discussed using a tube model of the vocal tract and perturbation theory. Understanding the acoustic properties of NEL will help refine the acoustic source and speech recognition in future studies.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103315"},"PeriodicalIF":3.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LORT: Locally refined convolution and Taylor transformer for monaural speech enhancement 局部精细卷积和Taylor变压器用于单音语音增强
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-09-30 DOI: 10.1016/j.specom.2025.103314
Junyu Wang , Zizhen Lin , Tianrui Wang , Meng Ge , Longbiao Wang , Jianwu Dang
Achieving superior enhancement performance while maintaining a low parameter count and computational complexity remains a challenge in the field of speech enhancement. In this paper, we introduce LORT, a novel architecture that integrates spatial-channel enhanced Taylor Transformer and locally refined convolution for efficient and robust speech enhancement. We propose a Taylor multi-head self-attention (T-MSA) module enhanced with spatial-channel enhancement attention (SCEA), designed to facilitate inter-channel information exchange and alleviate the spatial attention limitations inherent in Taylor-based Transformers. To complement global modeling, we further present a locally refined convolution (LRC) block that integrates convolutional feed-forward layers, time–frequency dense local convolutions, and gated units to capture fine-grained local details. Built upon a U-Net-like encoder–decoder structure with only 16 output channels in the encoder, LORT processes noisy inputs through multi-resolution T-MSA modules using alternating downsampling and upsampling operations. The enhanced magnitude and phase spectra are decoded independently and optimized through a composite loss function that jointly considers magnitude, complex, phase, discriminator, and consistency objectives. Experimental results on the VCTK+DEMAND and DNS Challenge datasets demonstrate that LORT achieves competitive or superior performance to state-of-the-art (SOTA) models with only 0.96M parameters, highlighting its effectiveness for real-world speech enhancement applications with limited computational resources.
如何在保持低参数数和计算复杂度的同时获得优异的增强性能,一直是语音增强领域面临的挑战。在本文中,我们介绍了LORT,一种集成了空间信道增强泰勒变压器和局部精细卷积的新架构,用于高效和鲁棒的语音增强。我们提出了一种基于空间通道增强注意(SCEA)的Taylor多头自注意(T-MSA)模块,旨在促进通道间信息交换并缓解基于泰勒的变形器固有的空间注意限制。为了补充全局建模,我们进一步提出了一个局部精细卷积(LRC)块,该块集成了卷积前馈层、时频密集局部卷积和门控单元,以捕获细粒度的局部细节。LORT基于类似u - net的编码器-解码器结构,在编码器中只有16个输出通道,通过使用交替下采样和上采样操作的多分辨率T-MSA模块处理噪声输入。增强的幅度和相位谱被独立解码,并通过综合考虑幅度、复度、相位、鉴别器和一致性目标的复合损失函数进行优化。在VCTK+DEMAND和DNS Challenge数据集上的实验结果表明,LORT仅使用0.96万个参数就取得了与最先进(SOTA)模型相当或更好的性能,突出了其在计算资源有限的现实语音增强应用中的有效性。
{"title":"LORT: Locally refined convolution and Taylor transformer for monaural speech enhancement","authors":"Junyu Wang ,&nbsp;Zizhen Lin ,&nbsp;Tianrui Wang ,&nbsp;Meng Ge ,&nbsp;Longbiao Wang ,&nbsp;Jianwu Dang","doi":"10.1016/j.specom.2025.103314","DOIUrl":"10.1016/j.specom.2025.103314","url":null,"abstract":"<div><div>Achieving superior enhancement performance while maintaining a low parameter count and computational complexity remains a challenge in the field of speech enhancement. In this paper, we introduce LORT, a novel architecture that integrates spatial-channel enhanced Taylor Transformer and locally refined convolution for efficient and robust speech enhancement. We propose a Taylor multi-head self-attention (T-MSA) module enhanced with spatial-channel enhancement attention (SCEA), designed to facilitate inter-channel information exchange and alleviate the spatial attention limitations inherent in Taylor-based Transformers. To complement global modeling, we further present a locally refined convolution (LRC) block that integrates convolutional feed-forward layers, time–frequency dense local convolutions, and gated units to capture fine-grained local details. Built upon a U-Net-like encoder–decoder structure with only 16 output channels in the encoder, LORT processes noisy inputs through multi-resolution T-MSA modules using alternating downsampling and upsampling operations. The enhanced magnitude and phase spectra are decoded independently and optimized through a composite loss function that jointly considers magnitude, complex, phase, discriminator, and consistency objectives. Experimental results on the VCTK+DEMAND and DNS Challenge datasets demonstrate that LORT achieves competitive or superior performance to state-of-the-art (SOTA) models with only 0.96M parameters, highlighting its effectiveness for real-world speech enhancement applications with limited computational resources.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103314"},"PeriodicalIF":3.0,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145223149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MDCNN: A multimodal dual-CNN recursive model for fake news detection via audio- and text-based speech emotion recognition MDCNN:一种多模态双cnn递归模型,用于基于音频和文本的语音情感识别的假新闻检测
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-09-24 DOI: 10.1016/j.specom.2025.103313
Hongchen Wu, Hongxuan Li, Xiaochang Fang, Mengqi Tang, Hongzhu Yu, Bing Yu, Meng Li, Zhaorong Jing, Yihong Meng, Wei Chen, Yu Liu, Chenfei Sun, Shuang Gao, Huaxiang Zhang
The increasing complexity and diversity of emotional expression pose challenges when identifying fake news conveyed through text and audio formats. Integrating emotional cues derived from data offers a promising approach for balancing the tradeoff between the volume and quality of data. Leveraging recent advancements in speech emotion recognition (SER), our study proposes a Multimodal Recursive Dual-Convolutional Neural Network Model (MDCNN) for fake news detection, with a focus on sentiment analysis based on audio and text. Our proposed model employs convolutional layers to extract features from both audio and text inputs, facilitating an effective feature fusion process for sentiment classification. Through a deep bidirectional recursive encoder, the model can better understand audio and text features for determining the final emotional category. Experiments conducted on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset, which contains 5531 samples across four emotion types—anger, happiness, neutrality, and sadness—demonstrate the superior performance of the MDCNN. Its weighted average precision (WAP) is 78.8 %, which is 2.5 % higher than that of the best baseline. Compared with the existing sentiment analysis models, our approach exhibits notable enhancements in terms of accurately detecting neutral categories, thereby addressing a common challenge faced by the prior models. These findings underscore the efficacy of the MDCNN in multimodal sentiment analysis tasks and its significant achievements in neutral category classification tasks, offering a robust solution for precisely detecting fake news and conducting nuanced emotional analyses in speech recognition scenarios.
情绪表达的复杂性和多样性日益增加,为识别通过文本和音频格式传达的假新闻带来了挑战。整合来自数据的情感线索为平衡数据数量和质量之间的权衡提供了一种很有前途的方法。利用语音情感识别(SER)的最新进展,我们的研究提出了一种用于假新闻检测的多模态递归双卷积神经网络模型(MDCNN),重点是基于音频和文本的情感分析。我们提出的模型采用卷积层从音频和文本输入中提取特征,为情感分类提供了有效的特征融合过程。通过深度双向递归编码器,该模型可以更好地理解音频和文本特征,从而确定最终的情感类别。在交互式情绪二元动作捕捉(IEMOCAP)数据集上进行的实验表明,MDCNN具有优越的性能,该数据集包含四种情绪类型(愤怒、快乐、中立和悲伤)的5531个样本。加权平均精度(WAP)为78.8%,比最佳基线提高2.5%。与现有的情感分析模型相比,我们的方法在准确检测中性类别方面表现出显著的增强,从而解决了先前模型面临的共同挑战。这些发现强调了MDCNN在多模态情绪分析任务中的有效性及其在中性类别分类任务中的显著成就,为语音识别场景中精确检测假新闻和进行细致的情绪分析提供了一个强大的解决方案。
{"title":"MDCNN: A multimodal dual-CNN recursive model for fake news detection via audio- and text-based speech emotion recognition","authors":"Hongchen Wu,&nbsp;Hongxuan Li,&nbsp;Xiaochang Fang,&nbsp;Mengqi Tang,&nbsp;Hongzhu Yu,&nbsp;Bing Yu,&nbsp;Meng Li,&nbsp;Zhaorong Jing,&nbsp;Yihong Meng,&nbsp;Wei Chen,&nbsp;Yu Liu,&nbsp;Chenfei Sun,&nbsp;Shuang Gao,&nbsp;Huaxiang Zhang","doi":"10.1016/j.specom.2025.103313","DOIUrl":"10.1016/j.specom.2025.103313","url":null,"abstract":"<div><div>The increasing complexity and diversity of emotional expression pose challenges when identifying fake news conveyed through text and audio formats. Integrating emotional cues derived from data offers a promising approach for balancing the tradeoff between the volume and quality of data. Leveraging recent advancements in speech emotion recognition (SER), our study proposes a Multimodal Recursive Dual-Convolutional Neural Network Model (MDCNN) for fake news detection, with a focus on sentiment analysis based on audio and text. Our proposed model employs convolutional layers to extract features from both audio and text inputs, facilitating an effective feature fusion process for sentiment classification. Through a deep bidirectional recursive encoder, the model can better understand audio and text features for determining the final emotional category. Experiments conducted on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset, which contains 5531 samples across four emotion types—anger, happiness, neutrality, and sadness—demonstrate the superior performance of the MDCNN. Its weighted average precision (WAP) is 78.8 %, which is 2.5 % higher than that of the best baseline. Compared with the existing sentiment analysis models, our approach exhibits notable enhancements in terms of accurately detecting neutral categories, thereby addressing a common challenge faced by the prior models. These findings underscore the efficacy of the MDCNN in multimodal sentiment analysis tasks and its significant achievements in neutral category classification tasks, offering a robust solution for precisely detecting fake news and conducting nuanced emotional analyses in speech recognition scenarios.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103313"},"PeriodicalIF":3.0,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145223148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phonetic reduction is associated with positive assessment and other pragmatic functions 语音还原与积极评价和其他语用功能有关
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-09-17 DOI: 10.1016/j.specom.2025.103305
Nigel G. Ward, Raul O. Gomez, Carlos A. Ortega, Georgina Bugarini
A fundamental goal of speech science is to inventory the meaning-conveying elements of human speech. This article provides evidence for including phonetic reduction in this inventory. Based on analysis of dialog data, we find that phonetic reduction is common with several important pragmatic functions, including the expression of positive assessment, in both American English and Mexican Spanish. For American English, we confirm, in a controlled experiment, that people speaking in a positive tone generally do indeed use more reduced forms.
语言科学的一个基本目标是清点人类语言的意义传递元素。这篇文章为在这个清单中包括语音缩减提供了证据。通过对对话数据的分析,我们发现,在美式英语和墨西哥西班牙语中,语音还原都具有一些重要的语用功能,包括表达积极评价。对于美式英语,我们在一项对照实验中证实,用积极语气说话的人通常确实会使用更多的简化形式。
{"title":"Phonetic reduction is associated with positive assessment and other pragmatic functions","authors":"Nigel G. Ward,&nbsp;Raul O. Gomez,&nbsp;Carlos A. Ortega,&nbsp;Georgina Bugarini","doi":"10.1016/j.specom.2025.103305","DOIUrl":"10.1016/j.specom.2025.103305","url":null,"abstract":"<div><div>A fundamental goal of speech science is to inventory the meaning-conveying elements of human speech. This article provides evidence for including phonetic reduction in this inventory. Based on analysis of dialog data, we find that phonetic reduction is common with several important pragmatic functions, including the expression of positive assessment, in both American English and Mexican Spanish. For American English, we confirm, in a controlled experiment, that people speaking in a positive tone generally do indeed use more reduced forms.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103305"},"PeriodicalIF":3.0,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145196002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MC-Mamba: Cross-modal target speaker extraction model based on multiple consistency MC-Mamba:基于多重一致性的跨模态目标说话人提取模型
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-09-16 DOI: 10.1016/j.specom.2025.103304
Ke Lv , Yuanjie Deng , Ying Wei
Target speaker extraction technology aims to extract the target speaker’s speech from mixed speech based on related cues. When using visual information as a cue, there exists a heterogeneity problem between audio and visual modalities, as they are different modalities. Therefore, some works have extracted visual features consistent with the target speech to mitigate the heterogeneity problem. However, most methods only consider a single type of consistency, which is insufficient to mitigate the modality gap. Furthermore, time-domain speaker extraction models still face modeling challenges when processing speech with numerous time steps. In this work, we propose MC-Mamba, a cross-modal target speaker extraction model based on multiple consistency. We design a consistent visual feature extractor to extract visual features that are consistent with the target speaker’s identity and content. Content-consistent visual features are used for audio–visual feature fusion, while identity-consistent visual features constrain the identity of separated speech. Notably, when extracting content-consistent visual features, our method does not rely on additional text datasets as labels, as is common in other works, enhancing its practical applicability. The Mamba blocks within the model efficiently process long speech signals by capturing both local and global information. Comparative experimental results show that our proposed speaker extraction model outperforms other state-of-the-art models in terms of speech quality and clarity.
目标说话人提取技术是基于相关线索从混合语音中提取目标说话人的语音。在使用视觉信息作为线索时,由于视听模态不同,存在着异质性问题。因此,一些研究通过提取与目标语音一致的视觉特征来缓解异质性问题。然而,大多数方法只考虑单一类型的一致性,这不足以减轻模态差距。此外,时域说话人提取模型在处理多时间步长的语音时仍然面临建模挑战。本文提出了基于多重一致性的跨模态目标说话人提取模型MC-Mamba。我们设计了一个一致的视觉特征提取器来提取与目标说话人身份和内容一致的视觉特征。内容一致的视觉特征用于视听特征融合,身份一致的视觉特征约束分离语音的身份。值得注意的是,在提取内容一致的视觉特征时,我们的方法不像其他研究那样依赖于额外的文本数据集作为标签,增强了它的实用性。模型中的曼巴模块通过捕获本地和全局信息有效地处理长语音信号。对比实验结果表明,我们提出的说话人提取模型在语音质量和清晰度方面优于其他最先进的模型。
{"title":"MC-Mamba: Cross-modal target speaker extraction model based on multiple consistency","authors":"Ke Lv ,&nbsp;Yuanjie Deng ,&nbsp;Ying Wei","doi":"10.1016/j.specom.2025.103304","DOIUrl":"10.1016/j.specom.2025.103304","url":null,"abstract":"<div><div>Target speaker extraction technology aims to extract the target speaker’s speech from mixed speech based on related cues. When using visual information as a cue, there exists a heterogeneity problem between audio and visual modalities, as they are different modalities. Therefore, some works have extracted visual features consistent with the target speech to mitigate the heterogeneity problem. However, most methods only consider a single type of consistency, which is insufficient to mitigate the modality gap. Furthermore, time-domain speaker extraction models still face modeling challenges when processing speech with numerous time steps. In this work, we propose MC-Mamba, a cross-modal target speaker extraction model based on multiple consistency. We design a consistent visual feature extractor to extract visual features that are consistent with the target speaker’s identity and content. Content-consistent visual features are used for audio–visual feature fusion, while identity-consistent visual features constrain the identity of separated speech. Notably, when extracting content-consistent visual features, our method does not rely on additional text datasets as labels, as is common in other works, enhancing its practical applicability. The Mamba blocks within the model efficiently process long speech signals by capturing both local and global information. Comparative experimental results show that our proposed speaker extraction model outperforms other state-of-the-art models in terms of speech quality and clarity.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"174 ","pages":"Article 103304"},"PeriodicalIF":3.0,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145105376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain adaptation using non-parallel target domain corpus for self-supervised learning-based automatic speech recognition 基于非并行目标领域语料库的领域自适应自监督学习自动语音识别
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-09-15 DOI: 10.1016/j.specom.2025.103303
Takahiro Kinouchi , Atsunori Ogawa , Yukoh Wakabayashi , Kengo Ohta , Norihide Kitaoka
The recognition accuracy of conventional automatic speech recognition (ASR) systems depends heavily on the amount of speech and associated transcription data available in the target domain for model training. However, preparing parallel speech and text data each time a model is trained for a new domain is costly and time-consuming. To solve this problem, we propose a method of domain adaptation that does not require the use of a large amount of parallel target domain training data, as most of the data used for model training is not from the target domain. Instead, only target domain speech is used for model training, along with non-target domain speech and its parallel text data, i.e., the domains and contents of the two types of training data do not correspond to one another. Collecting this type of training data is relatively inexpensive. Domain adaptation is performed in two steps: (1) A pre-trained wav2vec 2.0 model is further pre-trained using a large amount of target domain speech data and is then fine-tuned using a large amount of non-target domain speech and its transcriptions. (2) The density ratio approach (DRA) is applied during inference to a language model (LM) trained using target domain text unrelated to, and independently from, the wav2vec 2.0 training. Experimental evaluation illustrated that the proposed domain adaptation obtained character error rate (CER) 10.4 pts lower than baseline with wav2vec 2.0 and 3.9 pts with XLS-R under the situation that the parallel target domain data is unavailable against the target domain test set, achieving 34.4% and 16.2% reductions in relative CER.
传统的自动语音识别(ASR)系统的识别精度在很大程度上取决于目标域中可用的语音和相关转录数据的数量。然而,每次为新领域训练模型时,准备并行语音和文本数据既昂贵又耗时。为了解决这个问题,我们提出了一种不需要使用大量并行目标域训练数据的领域自适应方法,因为用于模型训练的大部分数据都不是来自目标域。而是只使用目标域语音进行模型训练,同时使用非目标域语音及其并行文本数据,即两类训练数据的域和内容不对应。收集这类训练数据的成本相对较低。领域自适应分为两个步骤:(1)使用大量的目标领域语音数据对预训练好的wav2vec 2.0模型进行进一步的预训练,然后使用大量的非目标领域语音及其转录进行微调。(2)在使用与wav2vec 2.0训练无关且独立的目标域文本训练的语言模型(LM)的推理过程中应用密度比方法(DRA)。实验评估表明,在平行目标域数据不可用的情况下,采用wav2vec 2.0和XLS-R的字符错误率分别比基线降低了10.4和3.9分,相对错误率分别降低了34.4%和16.2%。
{"title":"Domain adaptation using non-parallel target domain corpus for self-supervised learning-based automatic speech recognition","authors":"Takahiro Kinouchi ,&nbsp;Atsunori Ogawa ,&nbsp;Yukoh Wakabayashi ,&nbsp;Kengo Ohta ,&nbsp;Norihide Kitaoka","doi":"10.1016/j.specom.2025.103303","DOIUrl":"10.1016/j.specom.2025.103303","url":null,"abstract":"<div><div>The recognition accuracy of conventional automatic speech recognition (ASR) systems depends heavily on the amount of speech and associated transcription data available in the target domain for model training. However, preparing parallel speech and text data each time a model is trained for a new domain is costly and time-consuming. To solve this problem, we propose a method of domain adaptation that does not require the use of a large amount of parallel target domain training data, as most of the data used for model training is not from the target domain. Instead, only target domain speech is used for model training, along with non-target domain speech and its parallel text data, i.e., the domains and contents of the two types of training data do not correspond to one another. Collecting this type of training data is relatively inexpensive. Domain adaptation is performed in two steps: (1) A pre-trained wav2vec<!--> <!-->2.0 model is further pre-trained using a large amount of target domain speech data and is then fine-tuned using a large amount of non-target domain speech and its transcriptions. (2) The density ratio approach (DRA) is applied during inference to a language model (LM) trained using target domain text unrelated to, and independently from, the wav2vec<!--> <!-->2.0 training. Experimental evaluation illustrated that the proposed domain adaptation obtained character error rate (CER) 10.4<!--> <!-->pts lower than baseline with wav2vec<!--> <!-->2.0 and 3.9<!--> <!-->pts with XLS-R under the situation that the parallel target domain data is unavailable against the target domain test set, achieving 34.4% and 16.2% reductions in relative CER.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"174 ","pages":"Article 103303"},"PeriodicalIF":3.0,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145105375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DVSA: A focused and efficient sparse attention via explicit selection for speech recognition DVSA:通过显式选择实现语音识别的集中和高效的稀疏注意
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-09-04 DOI: 10.1016/j.specom.2025.103300
Minghan Zhang , Jing Song , Fei Xie , Ke Shi , Zhiyuan Guo , Fuliang Weng
Self-attention (SA) originally demonstrated its powerful ability in handling text sequences in machine translation tasks, and some studies have successfully applied it to automatic speech recognition (ASR) models. However, speech sequences exhibit significantly lower information density than text sequences, containing abundant silent, repetitive, and overlapping segments. Recent studies have also pointed out that the full attention mechanism used to extract global dependency relationships is not indispensable for state-of-the-art ASR models. Conventional full attention consumes quadratic computational complexity, and may extract redundant or even negative information. To address this, we propose Diagonal and Vertical Self-Attention (DVSA), a sparse attention mechanism for ASR. To extract more focused dependencies from the speech sequence with higher efficiency, we optimize the traditional SA calculation process by explicitly selecting and calculating only a subset of important dot products. This eliminates the misleading effect of dot products with common query degrees on the model and greatly alleviates the quadratic computational complexity. Experiments on LibriSpeech and Aishell-1 show that DVSA improves the performance of a Conformer-based model (a dominant architecture in ASR) by 6.5 % and 5.7 % respectively over traditional full attention, while significantly reducing computational complexity. Notably, DVSA enables reducing encoder layers by 33 % without performance degradation, yielding additional savings in parameters and computation. As a result, this new approach achieves the improvements in all three major metrics: accuracy, model size, and training and testing time efficiency.
自注意(Self-attention, SA)最初在处理机器翻译任务中的文本序列方面显示出强大的能力,一些研究已经成功地将其应用于自动语音识别(ASR)模型中。然而,语音序列的信息密度明显低于文本序列,包含大量的沉默、重复和重叠片段。最近的研究也指出,用于提取全局依赖关系的全注意机制并不是最先进的ASR模型所必需的。传统的全注意力消耗二次元的计算复杂度,并且可能提取冗余甚至负面信息。为了解决这个问题,我们提出了一种稀疏的ASR注意机制——对角和垂直自注意(DVSA)。为了更高效地从语音序列中提取更集中的依赖关系,我们优化了传统的SA计算过程,明确地选择和计算重要的点积子集。这消除了具有共同查询度的点积对模型的误导作用,大大减轻了二次计算的复杂度。在librisspeech和ahell -1上的实验表明,与传统的全注意力模型相比,DVSA将基于一致性的模型(ASR中的主流架构)的性能分别提高了6.5%和5.7%,同时显著降低了计算复杂度。值得注意的是,DVSA可以在不降低性能的情况下减少33%的编码器层,从而节省了参数和计算量。因此,这种新方法实现了所有三个主要度量的改进:准确性、模型大小、训练和测试时间效率。
{"title":"DVSA: A focused and efficient sparse attention via explicit selection for speech recognition","authors":"Minghan Zhang ,&nbsp;Jing Song ,&nbsp;Fei Xie ,&nbsp;Ke Shi ,&nbsp;Zhiyuan Guo ,&nbsp;Fuliang Weng","doi":"10.1016/j.specom.2025.103300","DOIUrl":"10.1016/j.specom.2025.103300","url":null,"abstract":"<div><div>Self-attention (SA) originally demonstrated its powerful ability in handling text sequences in machine translation tasks, and some studies have successfully applied it to automatic speech recognition (ASR) models. However, speech sequences exhibit significantly lower information density than text sequences, containing abundant silent, repetitive, and overlapping segments. Recent studies have also pointed out that the full attention mechanism used to extract global dependency relationships is not indispensable for state-of-the-art ASR models. Conventional full attention consumes quadratic computational complexity, and may extract redundant or even negative information. To address this, we propose Diagonal and Vertical Self-Attention (DVSA), a sparse attention mechanism for ASR. To extract more focused dependencies from the speech sequence with higher efficiency, we optimize the traditional SA calculation process by explicitly selecting and calculating only a subset of important dot products. This eliminates the misleading effect of dot products with common query degrees on the model and greatly alleviates the quadratic computational complexity. Experiments on LibriSpeech and Aishell-1 show that DVSA improves the performance of a Conformer-based model (a dominant architecture in ASR) by 6.5 % and 5.7 % respectively over traditional full attention, while significantly reducing computational complexity. Notably, DVSA enables reducing encoder layers by 33 % without performance degradation, yielding additional savings in parameters and computation. As a result, this new approach achieves the improvements in all three major metrics: accuracy, model size, and training and testing time efficiency.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"174 ","pages":"Article 103300"},"PeriodicalIF":3.0,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benefits of musical experience on whistled consonant categorization: analyzing the cognitive transfer processes 音乐体验对口哨辅音分类的益处:认知迁移过程分析
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-09-02 DOI: 10.1016/j.specom.2025.103302
Anaïs Tran Ngoc , Julien Meyer , Fanny Meunier
In this study, we investigated the transfer of musical skills to speech perception by analyzing the perception and categorization of consonants produced in whistled speech, a naturally modified speech form. The study had two main objectives: (i) to explore the effects of different levels of musical skill on speech perception, and (ii) to better understand the type of skills transferred by focusing on a group of high-level musicians, playing various instruments. Within this high-level group, we aimed to disentangle general cognitive transfers from sound-specific transfers by considering instrument specialization, contrasting general musical knowledge (shared by all instruments) with instrument-specific ones. We focused on four instruments: voice, violin, piano and flute. Our results confirm a general musical advantage and suggest that only a small amount of musical experience is sufficient for musical skills to benefit whistled speech perception. However, higher-level musicians reached better performances, with differences for specific consonants. Moreover, musical expertise appears to enhance rapid adaptation to the whistled signal throughout the experiment and our results highlight the specificity of instrument expertise. Consistent with previous research showing the impact of the instrument played, the differences observed in whistled speech processing among high-level musicians seem to be primarily due to instrument-specific expertise.
在这项研究中,我们通过分析口哨语音(一种自然修饰的语音形式)中产生的辅音的感知和分类来研究音乐技能对语音感知的转移。这项研究有两个主要目的:(i)探索不同水平的音乐技能对语言感知的影响;(ii)通过关注一群演奏不同乐器的高水平音乐家,更好地了解技能转移的类型。在这个高级小组中,我们的目标是通过考虑乐器专业化,将一般音乐知识(由所有乐器共享)与特定乐器的知识进行对比,将一般认知转移与特定声音的转移分开。我们专注于四种乐器:人声、小提琴、钢琴和长笛。我们的研究结果证实了一种普遍的音乐优势,并表明只有少量的音乐经验足以使音乐技能有益于口哨的语音感知。然而,高水平的音乐家达到了更好的表现,在特定的辅音上存在差异。此外,在整个实验中,音乐专业知识似乎增强了对口哨信号的快速适应,我们的结果强调了乐器专业知识的特殊性。先前的研究显示了演奏乐器的影响,与此一致的是,高水平音乐家在口哨语音处理方面观察到的差异似乎主要是由于乐器特定的专业知识。
{"title":"Benefits of musical experience on whistled consonant categorization: analyzing the cognitive transfer processes","authors":"Anaïs Tran Ngoc ,&nbsp;Julien Meyer ,&nbsp;Fanny Meunier","doi":"10.1016/j.specom.2025.103302","DOIUrl":"10.1016/j.specom.2025.103302","url":null,"abstract":"<div><div>In this study, we investigated the transfer of musical skills to speech perception by analyzing the perception and categorization of consonants produced in whistled speech, a naturally modified speech form. The study had two main objectives: (i) to explore the effects of different levels of musical skill on speech perception, and (ii) to better understand the type of skills transferred by focusing on a group of high-level musicians, playing various instruments. Within this high-level group, we aimed to disentangle general cognitive transfers from sound-specific transfers by considering instrument specialization, contrasting general musical knowledge (shared by all instruments) with instrument-specific ones. We focused on four instruments: voice, violin, piano and flute. Our results confirm a general musical advantage and suggest that only a small amount of musical experience is sufficient for musical skills to benefit whistled speech perception. However, higher-level musicians reached better performances, with differences for specific consonants. Moreover, musical expertise appears to enhance rapid adaptation to the whistled signal throughout the experiment and our results highlight the specificity of instrument expertise. Consistent with previous research showing the impact of the instrument played, the differences observed in whistled speech processing among high-level musicians seem to be primarily due to instrument-specific expertise.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"174 ","pages":"Article 103302"},"PeriodicalIF":3.0,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Individual differences in language acquisition: The impact of study abroad on native English speakers learning Spanish 语言习得的个体差异:海外留学对母语为英语的人学习西班牙语的影响
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-09-02 DOI: 10.1016/j.specom.2025.103301
Ratree Wayland , Rachel Meyer , Sophia Vellozzi , Kevin Tang
This study investigated the acquisition of lenition in Spanish voiced stops (/b, d, ɡ/) by native English speakers during a study-abroad program, focusing on individual differences and influencing factors. Lenition, characterized by the weakening of stops into fricative-like ([β], [ð], [ɣ]) or approximant-like ([β̞], [ð̞], [ɣ̞]) forms, poses challenges for L2 learners due to its gradient nature and the absence of analogous approximant forms in English. Results indicated that learners aligned with native speakers in recognizing voicing as the primary cue for lenition, yet their productions diverged, favoring fricative-like over approximant-like realizations. This preference reflects the combined influence of articulatory ease, acoustic salience, and cognitive demands.
Individual variability in learners’ trajectories highlights the role of exposure to native input and sociolinguistic engagement. Learners benefitting from richer, informal interactions with native speakers showed greater alignment with native patterns, while others demonstrated more limited progress. However, native input alone was insufficient for learners to internalize subtler distinctions such as place of articulation and stress. These findings emphasize the need for combining immersive experiences with targeted instructional strategies to address articulatory and cognitive challenges. This study contributes to the understanding of L2 phonological acquisition and offers insights for designing more effective language learning programs to support lenition acquisition in Spanish.
本研究调查了以英语为母语的人在海外学习过程中对西班牙语浊音顿音(/b, d, q /)的习得情况,重点探讨了个体差异及其影响因素。弱连音的特点是将停顿弱化为摩擦音([β], [ð],[[])或近似连音([β], [ð], [qh])形式,由于其梯度性质和英语中缺乏类似的近似形式,给二语学习者带来了挑战。结果表明,学习者与母语人士一致,认为语音是发音的主要线索,但他们的产出不同,倾向于摩擦性而不是近似性的实现。这种偏好反映了发音轻松、声音突出和认知需求的综合影响。学习者轨迹的个体差异突出了接触本地输入和社会语言学参与的作用。从与母语人士更丰富、非正式的互动中受益的学习者表现出与母语模式更一致,而其他人则表现出更有限的进步。然而,单纯的母语输入不足以让学习者内化更细微的区别,如发音位置和重音。这些发现强调需要将沉浸式体验与有针对性的教学策略相结合,以解决发音和认知方面的挑战。本研究有助于理解第二语言语音习得,并为设计更有效的语言学习计划来支持西班牙语的语音习得提供见解。
{"title":"Individual differences in language acquisition: The impact of study abroad on native English speakers learning Spanish","authors":"Ratree Wayland ,&nbsp;Rachel Meyer ,&nbsp;Sophia Vellozzi ,&nbsp;Kevin Tang","doi":"10.1016/j.specom.2025.103301","DOIUrl":"10.1016/j.specom.2025.103301","url":null,"abstract":"<div><div>This study investigated the acquisition of lenition in Spanish voiced stops (/b, d, ɡ/) by native English speakers during a study-abroad program, focusing on individual differences and influencing factors. Lenition, characterized by the weakening of stops into fricative-like ([β], [ð], [ɣ]) or approximant-like ([β̞], [ð̞], [ɣ̞]) forms, poses challenges for L2 learners due to its gradient nature and the absence of analogous approximant forms in English. Results indicated that learners aligned with native speakers in recognizing voicing as the primary cue for lenition, yet their productions diverged, favoring fricative-like over approximant-like realizations. This preference reflects the combined influence of articulatory ease, acoustic salience, and cognitive demands.</div><div>Individual variability in learners’ trajectories highlights the role of exposure to native input and sociolinguistic engagement. Learners benefitting from richer, informal interactions with native speakers showed greater alignment with native patterns, while others demonstrated more limited progress. However, native input alone was insufficient for learners to internalize subtler distinctions such as place of articulation and stress. These findings emphasize the need for combining immersive experiences with targeted instructional strategies to address articulatory and cognitive challenges. This study contributes to the understanding of L2 phonological acquisition and offers insights for designing more effective language learning programs to support lenition acquisition in Spanish.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"174 ","pages":"Article 103301"},"PeriodicalIF":3.0,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Speech Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1