首页 > 最新文献

Speech Communication最新文献

英文 中文
Using spatial sound reproduction for studying speech perception of listeners with different language immersion experiences 利用空间声音再现研究不同语言沉浸体验下听者的言语感知
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-11-01 DOI: 10.1016/j.specom.2025.103320
Yusuke Hioka , C.T. Justine Hui , Hinako Masuda , Yunqi C. Zhang , Eri Osawa , Takayuki Arai
This study evaluates a research method for studying speech perception of listeners with different language background under practical acoustic environments. The proposed research method utilises spatial sound reproduction, an emerging technology that enables reproducing arbitrary acoustic environments in controlled laboratory settings, for testing participants recruited at multiple locations that are geographically distant from each other. To validate the research method, the current study conducted a listening test in a real seminar room and chapel as well as under a spherical harmonics-based spatial sound reproduction that reproduced the acoustics of the two venues up to the third order and investigates differences in the results collected from the two test types. Three groups of participants who had different immersion level to New Zealand English were recruited in Auckland, New Zealand and Tokyo, Japan. The experimental results show that spatial sound reproduction is able to capture the advantage of first language (L1) listeners in terms of understanding speech in noise and reverberation correctly but is not sensitive enough to describe the subtle difference among second language (L2) listeners with different level of language immersion experiences. The research method is also partially able to describe how well listeners can benefit from spatial release from masking regardless of their language immersion experiences under room acoustics with higher speech clarity (C50), and may represent the effect of room acoustics in the real room within a certain range of room acoustics characterised by speech clarity.
本研究评估了一种在实际声学环境下研究不同语言背景听者言语感知的研究方法。提出的研究方法利用空间声音再现,这是一种新兴的技术,可以在受控的实验室环境中再现任意的声音环境,用于在地理上彼此距离较远的多个地点招募的测试参与者。为了验证研究方法,本研究在真实的研讨室和礼拜堂以及基于球面谐波的空间声音再现下进行了听力测试,再现了两个场所的三阶声学,并调查了两种测试类型收集的结果的差异。在新西兰奥克兰和日本东京招募了三组对新西兰英语有不同沉浸程度的参与者。实验结果表明,空间声音再现能够捕捉到第一语言(L1)听者在正确理解噪音和混响中的语音方面的优势,但不够敏感,无法描述不同语言沉浸体验水平的第二语言(L2)听者之间的细微差异。该研究方法还能够部分描述听者在较高语音清晰度(C50)的房间声学条件下,无论他们的语言沉浸体验如何,都能从掩蔽的空间释放中受益,并可能代表在以语音清晰度为特征的一定房间声学范围内,房间声学在真实房间中的效果。
{"title":"Using spatial sound reproduction for studying speech perception of listeners with different language immersion experiences","authors":"Yusuke Hioka ,&nbsp;C.T. Justine Hui ,&nbsp;Hinako Masuda ,&nbsp;Yunqi C. Zhang ,&nbsp;Eri Osawa ,&nbsp;Takayuki Arai","doi":"10.1016/j.specom.2025.103320","DOIUrl":"10.1016/j.specom.2025.103320","url":null,"abstract":"<div><div>This study evaluates a research method for studying speech perception of listeners with different language background under practical acoustic environments. The proposed research method utilises spatial sound reproduction, an emerging technology that enables reproducing arbitrary acoustic environments in controlled laboratory settings, for testing participants recruited at multiple locations that are geographically distant from each other. To validate the research method, the current study conducted a listening test in a real seminar room and chapel as well as under a spherical harmonics-based spatial sound reproduction that reproduced the acoustics of the two venues up to the third order and investigates differences in the results collected from the two test types. Three groups of participants who had different immersion level to New Zealand English were recruited in Auckland, New Zealand and Tokyo, Japan. The experimental results show that spatial sound reproduction is able to capture the advantage of first language (L1) listeners in terms of understanding speech in noise and reverberation correctly but is not sensitive enough to describe the subtle difference among second language (L2) listeners with different level of language immersion experiences. The research method is also partially able to describe how well listeners can benefit from spatial release from masking regardless of their language immersion experiences under room acoustics with higher speech clarity (C50), and may represent the effect of room acoustics in the real room within a certain range of room acoustics characterised by speech clarity.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103320"},"PeriodicalIF":3.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145418182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trading accuracy for fluency? An investigation of word retrieval difficulties in connected speech 用准确换取流利?连接语音词检索困难的研究
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-11-01 DOI: 10.1016/j.specom.2025.103325
Amber Römkens , Aurélie Pistono
Many authors view disfluencies as by-products of speech encoding difficulties, but it remains unclear which connected-speech phenomena genuinely reflect word-form retrieval problems. This study examined the relationship between retrieval difficulty, disfluency production, and individual differences in typically aging older adults, in light of both the Transmission Deficit Hypothesis (TDH) and the Inhibition Deficit Hypothesis (IDH). Twenty-five native Dutch-speaking adults aged 60 to 73 completed a connected-speech network task in which lexical frequency (high vs. low) was manipulated. Disfluencies and related responses were annotated and analyzed using generalized linear mixed-effects models. Lexical frequency did not affect the overall likelihood of disfluencies, arguing against IDH and against extending TDH to disfluency production. However, low-frequency words did increase semantically related answers, which could be consistent with the TDH. Vocabulary knowledge provided additional protection, with higher scores predicting fewer semantic alternatives. These findings suggest that disfluencies are not simply symptoms of retrieval failure. Implications are discussed in relation to “good-enough language production” and methodological challenges of capturing language production difficulties in cognitively demanding but ecologically valid tasks and contexts.
许多作者认为不流利是语音编码困难的副产品,但尚不清楚哪些连接语音现象真正反映了词形检索问题。本研究在传递缺陷假说(TDH)和抑制缺陷假说(IDH)的基础上,探讨了典型老年人检索困难、不流利产生和个体差异之间的关系。25名60 - 73岁的母语为荷兰语的成年人完成了一个连接语音网络任务,其中词汇频率(高与低)被操纵。使用广义线性混合效应模型对不流畅和相关反应进行注释和分析。词汇频率不影响不流利的总体可能性,反对IDH和反对将TDH扩展到不流利的产生。然而,低频词确实增加了语义相关的答案,这可能与TDH一致。词汇知识提供了额外的保护,更高的分数预示着更少的语义选择。这些发现表明,不流畅不仅仅是检索失败的症状。本文讨论了“足够好的语言生产”的含义,以及在认知要求但生态有效的任务和环境中捕捉语言生产困难的方法论挑战。
{"title":"Trading accuracy for fluency? An investigation of word retrieval difficulties in connected speech","authors":"Amber Römkens ,&nbsp;Aurélie Pistono","doi":"10.1016/j.specom.2025.103325","DOIUrl":"10.1016/j.specom.2025.103325","url":null,"abstract":"<div><div>Many authors view disfluencies as by-products of speech encoding difficulties, but it remains unclear which connected-speech phenomena genuinely reflect word-form retrieval problems. This study examined the relationship between retrieval difficulty, disfluency production, and individual differences in typically aging older adults, in light of both the Transmission Deficit Hypothesis (TDH) and the Inhibition Deficit Hypothesis (IDH). Twenty-five native Dutch-speaking adults aged 60 to 73 completed a connected-speech network task in which lexical frequency (high vs. low) was manipulated. Disfluencies and related responses were annotated and analyzed using generalized linear mixed-effects models. Lexical frequency did not affect the overall likelihood of disfluencies, arguing against IDH and against extending TDH to disfluency production. However, low-frequency words did increase semantically related answers, which could be consistent with the TDH. Vocabulary knowledge provided additional protection, with higher scores predicting fewer semantic alternatives. These findings suggest that disfluencies are not simply symptoms of retrieval failure. Implications are discussed in relation to “good-enough language production” and methodological challenges of capturing language production difficulties in cognitively demanding but ecologically valid tasks and contexts.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103325"},"PeriodicalIF":3.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145418184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Arabic dialects speech corpora: A systematic review 阿拉伯语方言语料库:系统回顾
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-11-01 DOI: 10.1016/j.specom.2025.103322
Ammar Mohammed Ali Alqadasi , Akram M. Zeki , Mohd Shahrizal Sunar , Siti Zaiton Mohd Hashim , Md Sah hj Salam , Rawad Abdulghafor
Speech processing applications are crucial in various domains, necessitating reliable speech recognition systems built upon suitable speech databases. However, the availability of comprehensive resources for the Arabic language remains limited compared to other languages like English. A systematic review was conducted to identify, analyze, and classify existing Arabic dialect speech databases. Initially, online digital databases and search engines were identified to collect a diverse range of manuscripts for thorough examination. The review encompassed 30 publicly accessible databases and an additional 39 self-databases, which were thoroughly studied, classified based on their characteristics, and subjected to a detailed analysis of research trends. This paper offers a comprehensive discussion on the diverse speech databases developed for various speech processing applications, highlighting the purposes and unique characteristics of Arabic speech databases. By providing valuable insights into their availability, characteristics, challenges, and research directions, this review aims to facilitate researchers' access to suitable resources for their specific applications, encourage the creation of new datasets in underrepresented areas, and promote open and easily accessible databases. Furthermore, the findings contribute to bridging the gap in available Arabic speech databases and serve as a valuable resource for researchers in the field.
© 2017 Elsevier Inc. All rights reserved.
语音处理应用在各个领域都是至关重要的,因此需要建立在合适的语音数据库上的可靠的语音识别系统。然而,与英语等其他语言相比,阿拉伯语的综合资源仍然有限。对现有的阿拉伯方言语音数据库进行了系统的回顾,以识别、分析和分类。最初,确定了在线数字数据库和搜索引擎,以收集各种各样的手稿进行彻底检查。审查包括30个可公开访问的数据库和另外39个自数据库,对这些数据库进行了彻底研究,根据其特点进行分类,并对研究趋势进行了详细分析。本文全面讨论了针对各种语音处理应用开发的各种语音数据库,突出了阿拉伯语语音数据库的目的和独特的特点。通过对这些数据的可用性、特征、挑战和研究方向提供有价值的见解,本综述旨在促进研究人员获取适合其特定应用的资源,鼓励在代表性不足的领域创建新的数据集,并促进开放和易于访问的数据库。此外,这些发现有助于弥补现有阿拉伯语语音数据库的空白,并为该领域的研究人员提供宝贵的资源。©2017 Elsevier Inc.版权所有。
{"title":"Arabic dialects speech corpora: A systematic review","authors":"Ammar Mohammed Ali Alqadasi ,&nbsp;Akram M. Zeki ,&nbsp;Mohd Shahrizal Sunar ,&nbsp;Siti Zaiton Mohd Hashim ,&nbsp;Md Sah hj Salam ,&nbsp;Rawad Abdulghafor","doi":"10.1016/j.specom.2025.103322","DOIUrl":"10.1016/j.specom.2025.103322","url":null,"abstract":"<div><div>Speech processing applications are crucial in various domains, necessitating reliable speech recognition systems built upon suitable speech databases. However, the availability of comprehensive resources for the Arabic language remains limited compared to other languages like English. A systematic review was conducted to identify, analyze, and classify existing Arabic dialect speech databases. Initially, online digital databases and search engines were identified to collect a diverse range of manuscripts for thorough examination. The review encompassed 30 publicly accessible databases and an additional 39 self-databases, which were thoroughly studied, classified based on their characteristics, and subjected to a detailed analysis of research trends. This paper offers a comprehensive discussion on the diverse speech databases developed for various speech processing applications, highlighting the purposes and unique characteristics of Arabic speech databases. By providing valuable insights into their availability, characteristics, challenges, and research directions, this review aims to facilitate researchers' access to suitable resources for their specific applications, encourage the creation of new datasets in underrepresented areas, and promote open and easily accessible databases. Furthermore, the findings contribute to bridging the gap in available Arabic speech databases and serve as a valuable resource for researchers in the field.</div><div>© 2017 Elsevier Inc. All rights reserved.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103322"},"PeriodicalIF":3.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting speech intelligibility in older adults for speech enhancement using the Gammachirp Envelope Similarity Index, GESI 使用Gammachirp包络相似性指数预测老年人语音可理解性以增强语音
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-11-01 DOI: 10.1016/j.specom.2025.103318
Ayako Yamamoto , Fuki Miyazaki , Toshio Irino
We propose an objective intelligibility measure (OIM), called the Gammachirp Envelope Similarity Index (GESI), that can predict speech intelligibility (SI) in older adults. GESI is a bottom-up model based on psychoacoustic knowledge from the peripheral to the central auditory system. It computes the single SI metric using the gammachirp filterbank (GCFB), the modulation filterbank, and the extended cosine similarity measure. It takes into account not only the hearing level represented in the audiogram, but also the temporal processing characteristics captured by the temporal modulation transfer function (TMTF). To evaluate performance, SI experiments were conducted with older adults of various hearing levels using speech-in-noise with ideal speech enhancement on familiarity-controlled Japanese words. The prediction performance was compared with HASPIw2, which was developed for keyword SI prediction. The results showed that GESI predicted the subjective SI scores more accurately than HASPIw2. GESI was also found to be at least as effective as, if not more effective than, HASPIv2 in predicting English sentence-level SI. The effect of introducing TMTF into the GESI algorithm was insignificant, suggesting that TMTF measurements and models are not yet mature. Therefore, it may be necessary to perform TMTF measurements with bandpass noise and to improve the incorporation of temporal characteristics into the model.
我们提出了一个客观的可理解性测量(OIM),称为Gammachirp包络相似性指数(GESI),可以预测老年人的语音可理解性(SI)。GESI是一种基于从外周到中枢听觉的心理声学知识的自下而上的模型。它使用gammachirp滤波器组(GCFB)、调制滤波器组和扩展余弦相似度度量来计算单个SI度量。它不仅考虑了听力图所表示的听力水平,而且还考虑了由时间调制传递函数(TMTF)捕获的时间处理特征。为了评估其表现,我们对不同听力水平的老年人进行了手语实验,在熟悉度控制的日语单词上使用了具有理想语音增强功能的噪音语音。并与专为关键词SI预测而开发的HASPIw2进行了预测性能比较。结果表明,GESI对主观SI评分的预测比HASPIw2更准确。GESI也被发现在预测英语句子水平的SI方面至少与HASPIv2一样有效,甚至更有效。在GESI算法中引入TMTF的效果不显著,说明TMTF的测量方法和模型还不成熟。因此,可能有必要在带通噪声的情况下进行TMTF测量,并改进将时间特征纳入模型。
{"title":"Predicting speech intelligibility in older adults for speech enhancement using the Gammachirp Envelope Similarity Index, GESI","authors":"Ayako Yamamoto ,&nbsp;Fuki Miyazaki ,&nbsp;Toshio Irino","doi":"10.1016/j.specom.2025.103318","DOIUrl":"10.1016/j.specom.2025.103318","url":null,"abstract":"<div><div>We propose an objective intelligibility measure (OIM), called the Gammachirp Envelope Similarity Index (GESI), that can predict speech intelligibility (SI) in older adults. GESI is a bottom-up model based on psychoacoustic knowledge from the peripheral to the central auditory system. It computes the single SI metric using the gammachirp filterbank (GCFB), the modulation filterbank, and the extended cosine similarity measure. It takes into account not only the hearing level represented in the audiogram, but also the temporal processing characteristics captured by the temporal modulation transfer function (TMTF). To evaluate performance, SI experiments were conducted with older adults of various hearing levels using speech-in-noise with ideal speech enhancement on familiarity-controlled Japanese words. The prediction performance was compared with HASPIw2, which was developed for keyword SI prediction. The results showed that GESI predicted the subjective SI scores more accurately than HASPIw2. GESI was also found to be at least as effective as, if not more effective than, HASPIv2 in predicting English sentence-level SI. The effect of introducing TMTF into the GESI algorithm was insignificant, suggesting that TMTF measurements and models are not yet mature. Therefore, it may be necessary to perform TMTF measurements with bandpass noise and to improve the incorporation of temporal characteristics into the model.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103318"},"PeriodicalIF":3.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145418181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic graph learning with gated convolutions for single-channel speech separation 基于门控卷积的单通道语音分离动态图学习
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-11-01 DOI: 10.1016/j.specom.2025.103321
Meng Zhang, Xinyu Jia, Yina Guo
Single-channel speech separation remains challenging due to the need for joint modeling of time-varying spectral patterns and spatial interactions between overlapping sources. While deep learning methods excel at temporal sequence processing, their fixed geometric representations inherently limit dynamic spatial relationship modeling. In this paper, we propose a novel dynamically learned Gated Dense Graph Convolutional Network (GDGCN) that overcomes the limitations of spatiotemporal dynamic modeling in speech separation. Specifically, we employ an adaptive hybrid topology integrating complete and K-partite graph structures to explicitly model multi-scale spatial dependencies between sound sources. Furthermore, we design a novel gating mechanism for speech graph data that maps node features to an information selection space through learnable projection matrices, dynamically regulating inter-node information flow. This architecture enables effective modeling of time-varying couplings without being constrained by static parameters. Experimental evaluations on benchmark datasets demonstrate the superior performance of our method for speech separation under noisy conditions as evidenced by objective metrics.
由于需要对时变频谱模式和重叠源之间的空间相互作用进行联合建模,单通道语音分离仍然具有挑战性。虽然深度学习方法擅长时间序列处理,但其固定的几何表示固有地限制了动态空间关系建模。在本文中,我们提出了一种新的动态学习门控密集图卷积网络(GDGCN),克服了时空动态建模在语音分离中的局限性。具体来说,我们采用了一种自适应混合拓扑,集成了完全和k部图结构,以明确地模拟声源之间的多尺度空间依赖关系。此外,我们设计了一种新的语音图数据门控机制,通过可学习的投影矩阵将节点特征映射到信息选择空间,动态调节节点间的信息流。这种体系结构能够有效地对时变耦合进行建模,而不受静态参数的约束。对基准数据集的实验评估表明,我们的方法在噪声条件下的语音分离性能优越,客观指标证明了这一点。
{"title":"Dynamic graph learning with gated convolutions for single-channel speech separation","authors":"Meng Zhang,&nbsp;Xinyu Jia,&nbsp;Yina Guo","doi":"10.1016/j.specom.2025.103321","DOIUrl":"10.1016/j.specom.2025.103321","url":null,"abstract":"<div><div>Single-channel speech separation remains challenging due to the need for joint modeling of time-varying spectral patterns and spatial interactions between overlapping sources. While deep learning methods excel at temporal sequence processing, their fixed geometric representations inherently limit dynamic spatial relationship modeling. In this paper, we propose a novel dynamically learned Gated Dense Graph Convolutional Network (GDGCN) that overcomes the limitations of spatiotemporal dynamic modeling in speech separation. Specifically, we employ an adaptive hybrid topology integrating complete and K-partite graph structures to explicitly model multi-scale spatial dependencies between sound sources. Furthermore, we design a novel gating mechanism for speech graph data that maps node features to an information selection space through learnable projection matrices, dynamically regulating inter-node information flow. This architecture enables effective modeling of time-varying couplings without being constrained by static parameters. Experimental evaluations on benchmark datasets demonstrate the superior performance of our method for speech separation under noisy conditions as evidenced by objective metrics.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103321"},"PeriodicalIF":3.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145418183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FinnAffect: An affective speech corpus for spontaneous Finnish FinnAffect:自发性芬兰语的情感语料库
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-11-01 DOI: 10.1016/j.specom.2025.103327
Kalle Lahtinen , Liisa Mustanoja , Okko Räsänen
Affective expression plays a major role in everyday spoken and written language. In order to study how affect is expressed by Finnish language users in day-to-day life, data consisting of samples from naturalistic and unscripted contexts is required. The present work describes the first spontaneous speech corpus for Finnish with affect-related annotations, containing 12,000 transcribed samples of unscripted speech paired with continuous-valued scores of valence and arousal marked by five native Finnish speakers. We first describe the creation of the corpus, based on combining speech samples from three large-scale Finnish speech corpora, from which we chose samples for annotation using an active learning-based affect mining approach. We then report characteristics of the resulting corpus and annotation consistency, followed by speech emotion recognition (SER) experiments with several classifiers and regression models to test the feasibility of the corpus for SER system development and evaluation. Annotation analyses reveal mean Pearson correlations between annotator scores and the mean of all annotators to be ρmean=0.856 for valence and ρmean=0.898 for arousal. The SER experiments on discretized labels result in an average unweighted average recall (UAR) of 0.458 for ternary valence classification and 0.719 for binary arousal classification using a fine-tuned ExHuBERT model for valence prediction and a support vector machine (SVM) classifier for arousal prediction, reaching comparable levels to those reported earlier for spontaneous speech. For the regression task, concordance correlation coefficients of 0.270 and 0.689 were obtained for valence and arousal, respectively, when using a WavLM-based model trained on MSP-Podcast corpus and fine-tuned on the target data. Overall, the analyses suggest that the corpus provides a feasible basis for later study on affective expression in spontaneous Finnish.
情感表达在日常口语和书面语中起着重要作用。为了研究芬兰语使用者在日常生活中如何表达情感,需要从自然情境和非脚本情境中收集样本数据。目前的工作描述了芬兰语的第一个带有情感相关注释的自发语音语料库,包含12,000个转录的非脚本语音样本,并与5个母语为芬兰语的人标记的价态和唤醒的连续值分数配对。我们首先描述了语料库的创建,该语料库基于三个大型芬兰语语料库的语音样本的组合,我们使用基于主动学习的影响挖掘方法从中选择样本进行注释。然后,我们报告了结果语料库的特征和注释一致性,随后使用几个分类器和回归模型进行了语音情感识别(SER)实验,以测试语料库用于SER系统开发和评估的可行性。注释分析表明,注释者得分与所有注释者的平均值之间的平均Pearson相关性为效价ρmean=0.856,唤醒ρmean=0.898。在离散标签上进行的SER实验结果表明,使用经过调整的ExHuBERT模型进行价态预测和支持向量机(SVM)分类器进行唤醒预测,三元价态分类的平均未加权平均召回率(UAR)为0.458,二元唤醒分类的平均未加权平均召回率(UAR)为0.719,达到了与之前报道的自发语音相当的水平。对于回归任务,当使用基于wavlm的模型训练MSP-Podcast语料并对目标数据进行调整时,效价和唤醒的一致性相关系数分别为0.270和0.689。综上所述,该语料库为后续对自发性芬兰语情感表达的研究提供了可行的基础。
{"title":"FinnAffect: An affective speech corpus for spontaneous Finnish","authors":"Kalle Lahtinen ,&nbsp;Liisa Mustanoja ,&nbsp;Okko Räsänen","doi":"10.1016/j.specom.2025.103327","DOIUrl":"10.1016/j.specom.2025.103327","url":null,"abstract":"<div><div>Affective expression plays a major role in everyday spoken and written language. In order to study how affect is expressed by Finnish language users in day-to-day life, data consisting of samples from naturalistic and unscripted contexts is required. The present work describes the first spontaneous speech corpus for Finnish with affect-related annotations, containing 12,000 transcribed samples of unscripted speech paired with continuous-valued scores of valence and arousal marked by five native Finnish speakers. We first describe the creation of the corpus, based on combining speech samples from three large-scale Finnish speech corpora, from which we chose samples for annotation using an active learning-based affect mining approach. We then report characteristics of the resulting corpus and annotation consistency, followed by speech emotion recognition (SER) experiments with several classifiers and regression models to test the feasibility of the corpus for SER system development and evaluation. Annotation analyses reveal mean Pearson correlations between annotator scores and the mean of all annotators to be <span><math><mrow><msub><mrow><mi>ρ</mi></mrow><mrow><mi>m</mi><mi>e</mi><mi>a</mi><mi>n</mi></mrow></msub><mo>=</mo><mn>0</mn><mo>.</mo><mn>856</mn></mrow></math></span> for valence and <span><math><mrow><msub><mrow><mi>ρ</mi></mrow><mrow><mi>m</mi><mi>e</mi><mi>a</mi><mi>n</mi></mrow></msub><mo>=</mo><mn>0</mn><mo>.</mo><mn>898</mn></mrow></math></span> for arousal. The SER experiments on discretized labels result in an average unweighted average recall (UAR) of 0.458 for ternary valence classification and 0.719 for binary arousal classification using a fine-tuned ExHuBERT model for valence prediction and a support vector machine (SVM) classifier for arousal prediction, reaching comparable levels to those reported earlier for spontaneous speech. For the regression task, concordance correlation coefficients of 0.270 and 0.689 were obtained for valence and arousal, respectively, when using a WavLM-based model trained on MSP-Podcast corpus and fine-tuned on the target data. Overall, the analyses suggest that the corpus provides a feasible basis for later study on affective expression in spontaneous Finnish.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103327"},"PeriodicalIF":3.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Direct speech-to-speech neural machine translation: A survey 直接语音到语音的神经机器翻译:综述
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-10-16 DOI: 10.1016/j.specom.2025.103317
Mahendra Gupta , Maitreyee Dutta , Chandresh Kumar Maurya
Speech-to-Speech Translation (S2ST) models transform speech from one language to another target language with the same linguistic information. S2ST is important for bridging the communication gap among communities and has diverse applications. In recent years, researchers have introduced direct S2ST models, which have the potential to translate speech without relying on intermediate text generation, have better-decoding latency, and the ability to preserve paralinguistic and non-linguistic features. However, direct S2ST has yet to achieve quality performance for seamless communication and still lags behind the cascade models in terms of performance, especially in real-world translation. To the best of our knowledge, no comprehensive survey is available on the direct S2ST system, which beginners and advanced researchers can look upon for a quick survey. The present work extensively reviews direct S2ST models, data and application issues, and performance metrics. We critically analyze the models’ performance over the benchmark datasets and provide research challenges and future directions.
语音到语音翻译(S2ST)模型将语音从一种语言转换为具有相同语言信息的另一种目标语言。S2ST对于弥合社区之间的沟通差距非常重要,并且具有多种应用。近年来,研究人员引入了直接S2ST模型,该模型有可能在不依赖中间文本生成的情况下翻译语音,具有更好的解码延迟,并且能够保留副语言和非语言特征。然而,直接S2ST尚未达到无缝通信的高质量性能,在性能方面仍然落后于级联模型,特别是在现实世界的翻译中。据我们所知,在直接S2ST系统上没有全面的调查,初学者和高级研究人员可以查看快速调查。目前的工作广泛地回顾了直接S2ST模型、数据和应用问题以及性能指标。我们批判性地分析了模型在基准数据集上的性能,并提供了研究挑战和未来方向。
{"title":"Direct speech-to-speech neural machine translation: A survey","authors":"Mahendra Gupta ,&nbsp;Maitreyee Dutta ,&nbsp;Chandresh Kumar Maurya","doi":"10.1016/j.specom.2025.103317","DOIUrl":"10.1016/j.specom.2025.103317","url":null,"abstract":"<div><div>Speech-to-Speech Translation (S2ST) models transform speech from one language to another target language with the same linguistic information. S2ST is important for bridging the communication gap among communities and has diverse applications. In recent years, researchers have introduced direct S2ST models, which have the potential to translate speech without relying on intermediate text generation, have better-decoding latency, and the ability to preserve paralinguistic and non-linguistic features. However, direct S2ST has yet to achieve quality performance for seamless communication and still lags behind the cascade models in terms of performance, especially in real-world translation. To the best of our knowledge, no comprehensive survey is available on the direct S2ST system, which beginners and advanced researchers can look upon for a quick survey. The present work extensively reviews direct S2ST models, data and application issues, and performance metrics. We critically analyze the models’ performance over the benchmark datasets and provide research challenges and future directions.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103317"},"PeriodicalIF":3.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145364600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robustness of emotion recognition in dialogue systems: A study on third-party API integrations and black-box attacks 对话系统中情感识别的鲁棒性:第三方API集成和黑盒攻击的研究
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-10-13 DOI: 10.1016/j.specom.2025.103316
Fatma Gumus , M. Fatih Amasyali
There is an intricate interplay between third-party AI application programming interfaces and adversarial machine learning. The investigation centers on vulnerabilities inherent in AI models utilizing multiple black-box APIs, with a particular emphasis on their susceptibility to attacks in the domains of speech and text recognition. Our exploration spans a spectrum of attack strategies, encompassing targeted, indiscriminate, and adaptive targeting approaches, each carefully designed to exploit unique facets of multi-modal inputs. The results underscore the intricate balance between attack success, average target class confidence, and the density of swaps and queries. Remarkably, targeted attacks exhibit an average success rate of 76%, while adaptive targeting achieves an even higher rate of 88%. Conversely, indiscriminate attacks attain an intermediate success rate of 73%, highlighting their potency even in the absence of strategic tailoring. Moreover, our strategies’ efficiency is evaluated through a resource utilization lens. Our findings reveal adaptive targeting as the most efficient approach, with an average of 2 word swaps and 140 queries per attack instance. In contrast, indiscriminate targeting requires an average of 2 word swaps and 150 queries per instance.
第三方AI应用程序编程接口和对抗性机器学习之间存在复杂的相互作用。调查集中在利用多个黑箱api的人工智能模型固有的漏洞上,特别强调它们在语音和文本识别领域对攻击的易感性。我们的研究涵盖了一系列攻击策略,包括有针对性的、不分青红皂白的和自适应的攻击方法,每种方法都经过精心设计,以利用多模态输入的独特方面。结果强调了攻击成功、平均目标类置信度以及交换和查询密度之间的复杂平衡。值得注意的是,目标攻击的平均成功率为76%,而自适应目标攻击的成功率甚至更高,达到88%。相反,不分青红皂白的攻击达到73%的中等成功率,即使在没有战略调整的情况下也突出了它们的效力。此外,我们的战略效率是通过资源利用的角度来评估的。我们的研究结果表明,自适应目标是最有效的方法,每个攻击实例平均有2个单词交换和140个查询。相比之下,不加区分的目标需要每个实例平均进行2次单词交换和150次查询。
{"title":"Robustness of emotion recognition in dialogue systems: A study on third-party API integrations and black-box attacks","authors":"Fatma Gumus ,&nbsp;M. Fatih Amasyali","doi":"10.1016/j.specom.2025.103316","DOIUrl":"10.1016/j.specom.2025.103316","url":null,"abstract":"<div><div>There is an intricate interplay between third-party AI application programming interfaces and adversarial machine learning. The investigation centers on vulnerabilities inherent in AI models utilizing multiple black-box APIs, with a particular emphasis on their susceptibility to attacks in the domains of speech and text recognition. Our exploration spans a spectrum of attack strategies, encompassing targeted, indiscriminate, and adaptive targeting approaches, each carefully designed to exploit unique facets of multi-modal inputs. The results underscore the intricate balance between attack success, average target class confidence, and the density of swaps and queries. Remarkably, targeted attacks exhibit an average success rate of 76%, while adaptive targeting achieves an even higher rate of 88%. Conversely, indiscriminate attacks attain an intermediate success rate of 73%, highlighting their potency even in the absence of strategic tailoring. Moreover, our strategies’ efficiency is evaluated through a resource utilization lens. Our findings reveal adaptive targeting as the most efficient approach, with an average of 2 word swaps and 140 queries per attack instance. In contrast, indiscriminate targeting requires an average of 2 word swaps and 150 queries per instance.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103316"},"PeriodicalIF":3.0,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145325872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An acoustic analysis of the nasal electrolarynx in healthy participants 健康受试者鼻电喉的声学分析
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-10-01 DOI: 10.1016/j.specom.2025.103315
Ching-Hung Lai , Shu-Wei Tsai , Chenhao Chiu , Yung-An Tsou , Ting-Shou Chang , David Shang-Yu Hung , Miyuki Hsing-Chun Hsieh , I-Pei Lee , Tammy Tsai
The nasal electrolarynx (NEL) is an innovative device that assists patients without vocal folds or under endotracheal intubation in producing speech sounds. The NEL has a different path for acoustic wave transmission to the traditional electrolarynx that starts from the nostril, passes through the nasal cavity, velopharyngeal port, and oral cavity, and exits the lips. There are several advantages to the NEL, including being non-handheld and not requiring a specific “sweet spot.” However, little is known about the acoustic characteristics of the NEL. This study investigated the acoustic characteristics of the NEL compared to normal speech using ten participants involved in two vowel production sessions. Compared to normal speech, NEL speech had low-frequency deficits in the linear predictive coding spectrum, higher first and second formants, decreased amplitude of the first formant, and increased amplitude of the nasal pole. The results identify the general acoustic features of the NEL, which are discussed using a tube model of the vocal tract and perturbation theory. Understanding the acoustic properties of NEL will help refine the acoustic source and speech recognition in future studies.
鼻电喉(NEL)是一种创新的设备,可以帮助没有声带或气管插管的患者发出语音。NEL的声波传播路径与传统的电喉不同,即从鼻孔出发,经过鼻腔、腭咽口、口腔,最后离开嘴唇。NEL有几个优点,包括非手持和不需要特定的“最佳点”。然而,人们对NEL的声学特性知之甚少。本研究通过10名参与者参与两个元音生成会话来研究NEL与正常语音的声学特征。与正常语音相比,NEL语音在线性预测编码谱上存在低频缺陷,第一和第二共振峰较高,第一共振峰幅度减小,鼻极幅度增大。结果确定了NEL的一般声学特征,并使用声道管模型和微扰理论进行了讨论。了解NEL的声学特性将有助于在未来的研究中完善声源和语音识别。
{"title":"An acoustic analysis of the nasal electrolarynx in healthy participants","authors":"Ching-Hung Lai ,&nbsp;Shu-Wei Tsai ,&nbsp;Chenhao Chiu ,&nbsp;Yung-An Tsou ,&nbsp;Ting-Shou Chang ,&nbsp;David Shang-Yu Hung ,&nbsp;Miyuki Hsing-Chun Hsieh ,&nbsp;I-Pei Lee ,&nbsp;Tammy Tsai","doi":"10.1016/j.specom.2025.103315","DOIUrl":"10.1016/j.specom.2025.103315","url":null,"abstract":"<div><div>The nasal electrolarynx (NEL) is an innovative device that assists patients without vocal folds or under endotracheal intubation in producing speech sounds. The NEL has a different path for acoustic wave transmission to the traditional electrolarynx that starts from the nostril, passes through the nasal cavity, velopharyngeal port, and oral cavity, and exits the lips. There are several advantages to the NEL, including being non-handheld and not requiring a specific “sweet spot.” However, little is known about the acoustic characteristics of the NEL. This study investigated the acoustic characteristics of the NEL compared to normal speech using ten participants involved in two vowel production sessions. Compared to normal speech, NEL speech had low-frequency deficits in the linear predictive coding spectrum, higher first and second formants, decreased amplitude of the first formant, and increased amplitude of the nasal pole. The results identify the general acoustic features of the NEL, which are discussed using a tube model of the vocal tract and perturbation theory. Understanding the acoustic properties of NEL will help refine the acoustic source and speech recognition in future studies.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103315"},"PeriodicalIF":3.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LORT: Locally refined convolution and Taylor transformer for monaural speech enhancement 局部精细卷积和Taylor变压器用于单音语音增强
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-09-30 DOI: 10.1016/j.specom.2025.103314
Junyu Wang , Zizhen Lin , Tianrui Wang , Meng Ge , Longbiao Wang , Jianwu Dang
Achieving superior enhancement performance while maintaining a low parameter count and computational complexity remains a challenge in the field of speech enhancement. In this paper, we introduce LORT, a novel architecture that integrates spatial-channel enhanced Taylor Transformer and locally refined convolution for efficient and robust speech enhancement. We propose a Taylor multi-head self-attention (T-MSA) module enhanced with spatial-channel enhancement attention (SCEA), designed to facilitate inter-channel information exchange and alleviate the spatial attention limitations inherent in Taylor-based Transformers. To complement global modeling, we further present a locally refined convolution (LRC) block that integrates convolutional feed-forward layers, time–frequency dense local convolutions, and gated units to capture fine-grained local details. Built upon a U-Net-like encoder–decoder structure with only 16 output channels in the encoder, LORT processes noisy inputs through multi-resolution T-MSA modules using alternating downsampling and upsampling operations. The enhanced magnitude and phase spectra are decoded independently and optimized through a composite loss function that jointly considers magnitude, complex, phase, discriminator, and consistency objectives. Experimental results on the VCTK+DEMAND and DNS Challenge datasets demonstrate that LORT achieves competitive or superior performance to state-of-the-art (SOTA) models with only 0.96M parameters, highlighting its effectiveness for real-world speech enhancement applications with limited computational resources.
如何在保持低参数数和计算复杂度的同时获得优异的增强性能,一直是语音增强领域面临的挑战。在本文中,我们介绍了LORT,一种集成了空间信道增强泰勒变压器和局部精细卷积的新架构,用于高效和鲁棒的语音增强。我们提出了一种基于空间通道增强注意(SCEA)的Taylor多头自注意(T-MSA)模块,旨在促进通道间信息交换并缓解基于泰勒的变形器固有的空间注意限制。为了补充全局建模,我们进一步提出了一个局部精细卷积(LRC)块,该块集成了卷积前馈层、时频密集局部卷积和门控单元,以捕获细粒度的局部细节。LORT基于类似u - net的编码器-解码器结构,在编码器中只有16个输出通道,通过使用交替下采样和上采样操作的多分辨率T-MSA模块处理噪声输入。增强的幅度和相位谱被独立解码,并通过综合考虑幅度、复度、相位、鉴别器和一致性目标的复合损失函数进行优化。在VCTK+DEMAND和DNS Challenge数据集上的实验结果表明,LORT仅使用0.96万个参数就取得了与最先进(SOTA)模型相当或更好的性能,突出了其在计算资源有限的现实语音增强应用中的有效性。
{"title":"LORT: Locally refined convolution and Taylor transformer for monaural speech enhancement","authors":"Junyu Wang ,&nbsp;Zizhen Lin ,&nbsp;Tianrui Wang ,&nbsp;Meng Ge ,&nbsp;Longbiao Wang ,&nbsp;Jianwu Dang","doi":"10.1016/j.specom.2025.103314","DOIUrl":"10.1016/j.specom.2025.103314","url":null,"abstract":"<div><div>Achieving superior enhancement performance while maintaining a low parameter count and computational complexity remains a challenge in the field of speech enhancement. In this paper, we introduce LORT, a novel architecture that integrates spatial-channel enhanced Taylor Transformer and locally refined convolution for efficient and robust speech enhancement. We propose a Taylor multi-head self-attention (T-MSA) module enhanced with spatial-channel enhancement attention (SCEA), designed to facilitate inter-channel information exchange and alleviate the spatial attention limitations inherent in Taylor-based Transformers. To complement global modeling, we further present a locally refined convolution (LRC) block that integrates convolutional feed-forward layers, time–frequency dense local convolutions, and gated units to capture fine-grained local details. Built upon a U-Net-like encoder–decoder structure with only 16 output channels in the encoder, LORT processes noisy inputs through multi-resolution T-MSA modules using alternating downsampling and upsampling operations. The enhanced magnitude and phase spectra are decoded independently and optimized through a composite loss function that jointly considers magnitude, complex, phase, discriminator, and consistency objectives. Experimental results on the VCTK+DEMAND and DNS Challenge datasets demonstrate that LORT achieves competitive or superior performance to state-of-the-art (SOTA) models with only 0.96M parameters, highlighting its effectiveness for real-world speech enhancement applications with limited computational resources.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103314"},"PeriodicalIF":3.0,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145223149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Speech Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1