首页 > 最新文献

Speech Communication最新文献

英文 中文
Robust prosody modeling for synthetic speech detection 用于合成语音检测的鲁棒韵律建模
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-08-27 DOI: 10.1016/j.specom.2025.103283
Ariel Cohen , Denis Shyrman , Aleksandr Solonskyi , Roman Frenkel , Arkady Krishtul , Oren Gal
This paper presents a comprehensive study on developing and implementing a speech prosody extractor to enhance audio security in Automatic Speaker Verification (ASV) systems. Our novel training approach, which operates without exposure to spoofing examples, significantly improves the modeling of essential prosodic elements often overlooked in deep fake attacks. By integrating codec and recording device embeddings, the prosody extractor effectively neutralizes codec-specific distortions, enhancing robustness across various audio transmission channels. Combined with state-of-the-art ASV systems, our prosody extractor reduces the Equal Error Rate (EER) by an average of 49.15% without codecs, 50.53% with the g711 codec, 44.77% with the g729 codec, 43.43% with the Vonage1 channel, 42.05% with ECAPA-TDNN, and 45.17% with TitaNet across diverse datasets, including high-quality commercial deep fakes.2,3 This integration markedly improves the detection and mitigation of sophisticated spoofing attempts, especially in compressed or altered audio environments. Our methodology also eliminates the dependency on textual data during training, enabling the use of larger and more varied datasets.
本文对语音韵律提取器的开发和实现进行了全面的研究,以提高自动说话人验证(ASV)系统中的音频安全性。我们的新训练方法在不暴露于欺骗示例的情况下运行,显著改善了在深度假攻击中经常被忽视的基本韵律元素的建模。通过集成编解码器和记录设备嵌入,韵律提取器有效地消除了编解码器特定的失真,增强了跨各种音频传输通道的鲁棒性。结合最先进的ASV系统,我们的韵律提取器在不同的数据集(包括高质量的商业深度伪造)上,在没有编解码器的情况下平均降低相等错误率(EER) 49.15%,使用g711编解码器时平均降低50.53%,使用g729编解码器时平均降低44.77%,使用Vonage1通道时平均降低43.43%,使用ecpa - tdnn时平均降低42.05%,使用TitaNet时平均降低45.17%这种集成显著提高了对复杂欺骗尝试的检测和缓解,特别是在压缩或改变的音频环境中。我们的方法还消除了训练过程中对文本数据的依赖,从而能够使用更大、更多样化的数据集。
{"title":"Robust prosody modeling for synthetic speech detection","authors":"Ariel Cohen ,&nbsp;Denis Shyrman ,&nbsp;Aleksandr Solonskyi ,&nbsp;Roman Frenkel ,&nbsp;Arkady Krishtul ,&nbsp;Oren Gal","doi":"10.1016/j.specom.2025.103283","DOIUrl":"10.1016/j.specom.2025.103283","url":null,"abstract":"<div><div>This paper presents a comprehensive study on developing and implementing a speech prosody extractor to enhance audio security in Automatic Speaker Verification (ASV) systems. Our novel training approach, which operates without exposure to spoofing examples, significantly improves the modeling of essential prosodic elements often overlooked in deep fake attacks. By integrating codec and recording device embeddings, the prosody extractor effectively neutralizes codec-specific distortions, enhancing robustness across various audio transmission channels. Combined with state-of-the-art ASV systems, our prosody extractor reduces the Equal Error Rate (EER) by an average of 49.15% without codecs, 50.53% with the g711 codec, 44.77% with the g729 codec, 43.43% with the Vonage<span><span><sup>1</sup></span></span> channel, 42.05% with ECAPA-TDNN, and 45.17% with TitaNet across diverse datasets, including high-quality commercial deep fakes.<span><span><sup>2</sup></span></span><sup>,</sup><span><span><sup>3</sup></span></span> This integration markedly improves the detection and mitigation of sophisticated spoofing attempts, especially in compressed or altered audio environments. Our methodology also eliminates the dependency on textual data during training, enabling the use of larger and more varied datasets.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"174 ","pages":"Article 103283"},"PeriodicalIF":3.0,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144919932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phonology-guided speech-to-speech translation for African languages 非洲语言语音导向的语音翻译
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-08-26 DOI: 10.1016/j.specom.2025.103287
Peter Ochieng , Dennis Kaburu
We present a prosody-guided framework for speech-to-speech translation (S2ST) that aligns and translates speech without transcripts by leveraging cross-linguistic pause synchrony. Analyzing a 6000-hour East African news corpus spanning five languages, we show that within-phylum language pairs exhibit 30%–40% lower pause variance and over 3× higher onset/offset correlation compared to cross-phylum pairs. These findings motivate SPaDA, a dynamic-programming alignment algorithm that integrates silence consistency, rate synchrony, and semantic similarity. SPaDA improves alignment F1 by +3–4 points and eliminates up to 38% of spurious matches relative to greedy VAD baselines. Using SPaDA-aligned segments, we train SegUniDiff, a diffusion-based S2ST model guided by external gradients from frozen semantic and speaker encoders. SegUniDiff matches an enhanced cascade in BLEU (30.3 on CVSS-C vs. 28.9 for UnitY), reduces speaker error rate (EER) from 12.5% to 5.3%, and runs at an RTF of 1.02. To support evaluation in low-resource settings, we also release a three-tier, transcript-free BLEU suite (M1–M3) that correlates strongly with human judgments. Together, our results show that prosodic cues in multilingual speech provide a reliable scaffold for scalable, non-autoregressive S2ST.
我们提出了一个韵律引导的语音到语音翻译(S2ST)框架,该框架通过利用跨语言暂停同步来对齐和翻译没有转录本的语音。通过分析跨越五种语言的6000小时东非新闻语料库,我们发现,与跨门语言对相比,门内语言对表现出30%-40%的低停顿方差和超过3倍的高起始/偏移相关性。这些发现激发了SPaDA,一种集成了沉默一致性、速率同步性和语义相似性的动态规划对齐算法。SPaDA将对齐F1提高了+ 3-4点,并且相对于贪婪的VAD基线消除了多达38%的虚假匹配。使用spada对齐的片段,我们训练了SegUniDiff,这是一种基于扩散的S2ST模型,由来自冻结语义和说话编码器的外部梯度引导。SegUniDiff匹配BLEU中的增强级联(CVSS-C为30.3,UnitY为28.9),将说话者错误率(EER)从12.5%降低到5.3%,RTF为1.02。为了支持低资源环境下的评估,我们还发布了一个三层、无转录的BLEU套件(M1-M3),它与人类的判断密切相关。总之,我们的研究结果表明,多语言语音中的韵律线索为可扩展的、非自回归的S2ST提供了可靠的支撑。
{"title":"Phonology-guided speech-to-speech translation for African languages","authors":"Peter Ochieng ,&nbsp;Dennis Kaburu","doi":"10.1016/j.specom.2025.103287","DOIUrl":"10.1016/j.specom.2025.103287","url":null,"abstract":"<div><div>We present a prosody-guided framework for speech-to-speech translation (S2ST) that aligns and translates speech <em>without</em> transcripts by leveraging cross-linguistic pause synchrony. Analyzing a 6000-hour East African news corpus spanning five languages, we show that <em>within-phylum</em> language pairs exhibit 30%–40% lower pause variance and over 3<span><math><mo>×</mo></math></span> higher onset/offset correlation compared to cross-phylum pairs. These findings motivate <strong>SPaDA</strong>, a dynamic-programming alignment algorithm that integrates silence consistency, rate synchrony, and semantic similarity. SPaDA improves alignment <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> by +3–4 points and eliminates up to 38% of spurious matches relative to greedy VAD baselines. Using SPaDA-aligned segments, we train <strong>SegUniDiff</strong>, a diffusion-based S2ST model guided by <em>external gradients</em> from frozen semantic and speaker encoders. SegUniDiff matches an enhanced cascade in BLEU (30.3 on CVSS-C vs. 28.9 for UnitY), reduces speaker error rate (EER) from 12.5% to 5.3%, and runs at an RTF of 1.02. To support evaluation in low-resource settings, we also release a three-tier, transcript-free BLEU suite (M1–M3) that correlates strongly with human judgments. Together, our results show that prosodic cues in multilingual speech provide a reliable scaffold for scalable, non-autoregressive S2ST.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"174 ","pages":"Article 103287"},"PeriodicalIF":3.0,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144907207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prosodic characteristics of deceptive picture descriptions in Finnish: Acoustics, beliefs, self-evaluations, and deception theories 芬兰语欺骗性图片描述的韵律特征:声学、信念、自我评价和欺骗理论
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-08-14 DOI: 10.1016/j.specom.2025.103299
Anne Väisänen, Satu Hopponen
Psycholinguistic research has sought to find reliable cues of deception in speech, but the results have been mixed. Prosody varies between individuals, situations, and languages, which makes comparisons difficult, and research designs also differ considerably. We examined vocal cues that distinguish deceptive from truthful speech in Finnish by analyzing acoustic-prosodic features collected from 20 native Finnish speakers in a neutral, low-stakes picture description setting.
Using a within-subject design, participants were randomly assigned to start with either the deceptive or truthful task. Additionally, we studied whether acoustic-prosodic features correlated with participants’ beliefs about deception cues and their experiences of motivation, success, and nervousness during the deceptive task.
We found that lying increased pitch range and mean intensity, and there was a higher number of repetitions in addition to a decrease in minimum pitch and response latency. Participants who believed speech variability would increase in deceptive speech exhibited an increase. Further, the average pitch was positively correlated with self-reported success but inversely correlated with nervousness. However, filled pauses showed opposite results. Our findings support some aspects of both attempted control and cognitive load theories.
Contradictory to some previous studies, we did not find a consistent increase in pitch and response latency in deceptive speech. Instead, individual variations in speech prosody were linked to subjective perceptions of nervousness and success. Additionally, neither attempted control nor cognitive load theory offers a comprehensive explanation of changes we observed. We conclude that further research is needed in different contexts and languages.
心理语言学研究一直试图在言语中找到可靠的欺骗线索,但结果好坏参半。韵律因个人、情境和语言而异,这使得比较变得困难,研究设计也有很大不同。我们通过分析20名芬兰语母语者在中性、低风险的图片描述环境中收集的声学韵律特征,研究了芬兰语中区分谎言和真实话语的声音线索。使用主题内设计,参与者被随机分配从欺骗或真实任务开始。此外,我们还研究了声学韵律特征是否与参与者对欺骗线索的信念以及他们在欺骗任务中对动机、成功和紧张的体验相关。我们发现谎言增加了音高范围和平均强度,重复次数增加,最小音高和反应潜伏期减少。那些相信言语变异性会在欺骗性言语中增加的参与者表现出了增加。此外,平均pitch与自我报告的成功呈正相关,但与紧张程度呈负相关。然而,填充停顿显示出相反的结果。我们的研究结果支持了尝试控制和认知负荷理论的某些方面。与之前的一些研究相反,我们没有发现欺骗性言语中音调和反应延迟的持续增加。相反,说话韵律的个体差异与紧张和成功的主观感知有关。此外,尝试控制理论和认知负荷理论都不能全面解释我们观察到的变化。我们得出结论,需要在不同的语境和语言中进行进一步的研究。
{"title":"Prosodic characteristics of deceptive picture descriptions in Finnish: Acoustics, beliefs, self-evaluations, and deception theories","authors":"Anne Väisänen,&nbsp;Satu Hopponen","doi":"10.1016/j.specom.2025.103299","DOIUrl":"10.1016/j.specom.2025.103299","url":null,"abstract":"<div><div>Psycholinguistic research has sought to find reliable cues of deception in speech, but the results have been mixed. Prosody varies between individuals, situations, and languages, which makes comparisons difficult, and research designs also differ considerably. We examined vocal cues that distinguish deceptive from truthful speech in Finnish by analyzing acoustic-prosodic features collected from 20 native Finnish speakers in a neutral, low-stakes picture description setting.</div><div>Using a within-subject design, participants were randomly assigned to start with either the deceptive or truthful task. Additionally, we studied whether acoustic-prosodic features correlated with participants’ beliefs about deception cues and their experiences of motivation, success, and nervousness during the deceptive task.</div><div>We found that lying increased pitch range and mean intensity, and there was a higher number of repetitions in addition to a decrease in minimum pitch and response latency. Participants who believed speech variability would increase in deceptive speech exhibited an increase. Further, the average pitch was positively correlated with self-reported success but inversely correlated with nervousness. However, filled pauses showed opposite results. Our findings support some aspects of both attempted control and cognitive load theories.</div><div>Contradictory to some previous studies, we did not find a consistent increase in pitch and response latency in deceptive speech. Instead, individual variations in speech prosody were linked to subjective perceptions of nervousness and success. Additionally, neither attempted control nor cognitive load theory offers a comprehensive explanation of changes we observed. We conclude that further research is needed in different contexts and languages.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"174 ","pages":"Article 103299"},"PeriodicalIF":3.0,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144863872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SF-AN: A lightweight shuffle Fourier attention network for multi-channel speech enhancement 一种用于多通道语音增强的轻量级洗牌傅立叶注意网络
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-08-14 DOI: 10.1016/j.specom.2025.103296
Shiyun Xu, Yinghan Cao, Zehua Zhang, Yukun Qian, Changjun He, Mingjiang Wang
Models based on deep learning achieve remarkable results in multi-channel speech enhancement. However, they require many parameters and computational resources, challenging practical applications. This paper integrates the idea of branch splitting and channel shuffling with Fourier convolution, proposing a lightweight shuffle Fourier attention network named SF-AN. In addition, we propose flexible channel attention, multi-scale spatial attention, and multi-dimensional collaborative attention, which effectively extract global channel and temporal-frequency information with low parameters and computational resources. Based on the results from the L3DAS22 dataset, SF-AN achieves a Metric score of 0.933 with only 778.75 K parameters and 4.78 G MACs, outperforming other models that require more computational resources. Under various noise and reverberation conditions, SF-AN also exhibits outstanding denoising and dereverberation performance. On average, the PESQ of SF-AN improves from 3.359 to 3.420 in noisy environments, and from 2.931 to 3.328 in reverberant environments compared to EaBNet. When both noise and reverberation are present, SF-AN achieves a PESQ score of 2.864, a STOI score of 0.887, and a SI-SDR score of 6.233. SF-AN is capable of effectively suppressing noise and reverberation while maintaining a very lightweight structure, thereby enhancing the listening experience for listener.
基于深度学习的模型在多通道语音增强方面取得了显著的效果。然而,它们需要许多参数和计算资源,对实际应用具有挑战性。本文将分支分裂和信道变换思想与傅里叶卷积相结合,提出了一种轻量级的洗牌傅里叶注意网络SF-AN。此外,我们还提出了柔性信道注意、多尺度空间注意和多维协同注意,以低参数和低计算资源有效提取全局信道和时频信息。基于L3DAS22数据集的结果,SF-AN仅使用778.75 K参数和4.78 G mac就获得了0.933的Metric分数,优于其他需要更多计算资源的模型。在各种噪声和混响条件下,SF-AN也表现出出色的去噪和去噪性能。与EaBNet相比,SF-AN在噪声环境下的平均PESQ从3.359提高到3.420,在混响环境下从2.931提高到3.328。当噪声和混响同时存在时,SF-AN的PESQ得分为2.864,STOI得分为0.887,SI-SDR得分为6.233。SF-AN能够有效地抑制噪音和混响,同时保持非常轻的结构,从而增强听者的聆听体验。
{"title":"SF-AN: A lightweight shuffle Fourier attention network for multi-channel speech enhancement","authors":"Shiyun Xu,&nbsp;Yinghan Cao,&nbsp;Zehua Zhang,&nbsp;Yukun Qian,&nbsp;Changjun He,&nbsp;Mingjiang Wang","doi":"10.1016/j.specom.2025.103296","DOIUrl":"10.1016/j.specom.2025.103296","url":null,"abstract":"<div><div>Models based on deep learning achieve remarkable results in multi-channel speech enhancement. However, they require many parameters and computational resources, challenging practical applications. This paper integrates the idea of branch splitting and channel shuffling with Fourier convolution, proposing a lightweight shuffle Fourier attention network named SF-AN. In addition, we propose flexible channel attention, multi-scale spatial attention, and multi-dimensional collaborative attention, which effectively extract global channel and temporal-frequency information with low parameters and computational resources. Based on the results from the L3DAS22 dataset, SF-AN achieves a Metric score of 0.933 with only 778.75 K parameters and 4.78 G MACs, outperforming other models that require more computational resources. Under various noise and reverberation conditions, SF-AN also exhibits outstanding denoising and dereverberation performance. On average, the PESQ of SF-AN improves from 3.359 to 3.420 in noisy environments, and from 2.931 to 3.328 in reverberant environments compared to EaBNet. When both noise and reverberation are present, SF-AN achieves a PESQ score of 2.864, a STOI score of 0.887, and a SI-SDR score of 6.233. SF-AN is capable of effectively suppressing noise and reverberation while maintaining a very lightweight structure, thereby enhancing the listening experience for listener.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"174 ","pages":"Article 103296"},"PeriodicalIF":3.0,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144863939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic classification of vocal intensity categories from amplitude-normalized speech signals by comparing acoustic features and classifier models 通过比较声学特征和分类器模型,从幅度归一化语音信号中自动分类声音强度类别
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-08-11 DOI: 10.1016/j.specom.2025.103288
Manila Kodali , Luna Ansari , Sudarsana Reddy Kadiri , Shrikanth Narayanan , Paavo Alku
Regulation of vocal intensity is a fundamental phenomenon in speech communication. Speakers use different intensity categories (e.g., soft, normal, and loud voice) to generate different vocal emotions or to communicate in noisy conditions or over varying distances. Vocal intensity categories have been studied in fundamental research of speech, but much less is known about their automatic classification. This study investigates the classification of vocal intensity categories from speech signals in a scenario, where the original level information of speech is absent and the signal is presented on a normalized amplitude scale. Different acoustic features were studied together with machine learning (ML) and deep learning (DL) classifiers using two different labeling approaches. Speech signals recorded from 50 speakers reciting sentences in four intensity categories (soft, normal, loud, and very loud) were analyzed. Altogether 15 feature sets including different cepstral, spectral and handcrafted (eGeMAPS) features were compared. Three ML classifiers (support vector machine, random forest and AdaBoost), and four DL classifiers (deep neural network, convolutional neural network, recurrent neural network and bidirectional long short-term memory network) were compared. The best classification accuracy of 86.0% was obtained by combining the best performing cepstral and spectral features and using the bidirectional long short-term memory classifier.
声音强度调节是言语交际中的一种基本现象。说话者使用不同的强度类别(例如,柔和、正常和响亮的声音)来产生不同的声音情绪,或在嘈杂的条件下或在不同的距离上进行交流。在语音基础研究中,对声音强度分类进行了研究,但对其自动分类却知之甚少。本研究研究了在不存在原始语音水平信息的情况下,将语音信号以一种归一化幅度尺度呈现的情况下,从语音信号中对声音强度类别进行分类。不同的声学特征与机器学习(ML)和深度学习(DL)分类器一起使用两种不同的标记方法进行研究。分析了50名说话者在四种强度类别(柔和、正常、大声和非常大声)下背诵句子时所记录的语音信号。总共比较了15个特征集,包括不同的倒谱、谱和手工(eGeMAPS)特征。比较了3种ML分类器(支持向量机、随机森林和AdaBoost)和4种DL分类器(深度神经网络、卷积神经网络、循环神经网络和双向长短期记忆网络)。结合最优的倒谱特征和谱特征,采用双向长短期记忆分类器,分类准确率达到86.0%。
{"title":"Automatic classification of vocal intensity categories from amplitude-normalized speech signals by comparing acoustic features and classifier models","authors":"Manila Kodali ,&nbsp;Luna Ansari ,&nbsp;Sudarsana Reddy Kadiri ,&nbsp;Shrikanth Narayanan ,&nbsp;Paavo Alku","doi":"10.1016/j.specom.2025.103288","DOIUrl":"10.1016/j.specom.2025.103288","url":null,"abstract":"<div><div>Regulation of vocal intensity is a fundamental phenomenon in speech communication. Speakers use different intensity categories (e.g., soft, normal, and loud voice) to generate different vocal emotions or to communicate in noisy conditions or over varying distances. Vocal intensity categories have been studied in fundamental research of speech, but much less is known about their automatic classification. This study investigates the classification of vocal intensity categories from speech signals in a scenario, where the original level information of speech is absent and the signal is presented on a normalized amplitude scale. Different acoustic features were studied together with machine learning (ML) and deep learning (DL) classifiers using two different labeling approaches. Speech signals recorded from 50 speakers reciting sentences in four intensity categories (soft, normal, loud, and very loud) were analyzed. Altogether 15 feature sets including different cepstral, spectral and handcrafted (eGeMAPS) features were compared. Three ML classifiers (support vector machine, random forest and AdaBoost), and four DL classifiers (deep neural network, convolutional neural network, recurrent neural network and bidirectional long short-term memory network) were compared. The best classification accuracy of 86.0% was obtained by combining the best performing cepstral and spectral features and using the bidirectional long short-term memory classifier.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"174 ","pages":"Article 103288"},"PeriodicalIF":3.0,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144852682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TranSTYLer: Multimodal behavioural style transfer for facial and body gestures generation transstyler:面部和身体手势生成的多模式行为风格转移
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-08-05 DOI: 10.1016/j.specom.2025.103286
Mireille Fares , Catherine Pelachaud , Nicolas Obin
This paper addresses the challenge of transferring the behaviour expressivity style of a virtual agent to another one while preserving behaviour shape as they carry communicative meaning. Behaviour expressivity style is viewed here as the qualitative properties of behaviours. We propose TranSTYLer, a multimodal transformer-based model that synthesises the multimodal behaviours of a source speaker with the style of a target speaker. We assume that behaviour expressivity style is encoded across various modalities of communication, including text, speech, body gestures, and facial expressions. The model employs a style-content disentanglement schema to ensure that the transferred style does not interfere with the meaning conveyed by the source’s behaviours. Our approach eliminates the need for style labels and allows the generalisation of styles not seen during the training phase. We train our model on the PATS corpus, which we extended to include dialogue acts and 2D facial landmarks. Objective and subjective evaluations show that our model outperforms state-of-the-art models in style transfer for both seen and unseen styles during training. To tackle the issues of style and content leakage that may arise, we propose a methodology to assess the degree to which behaviour and gestures associated with the target style are successfully transferred while ensuring the preservation of the ones related to the source content.
本文解决了将一个虚拟代理的行为表达风格转移到另一个虚拟代理的挑战,同时保留行为形状,因为它们具有交际意义。行为表达风格在这里被视为行为的定性属性。我们提出了TranSTYLer,这是一个基于多模态转换器的模型,它综合了源说话者与目标说话者的多模态行为。我们假设行为表达风格是在各种交流方式中编码的,包括文本、语言、肢体动作和面部表情。该模型采用风格-内容分离模式,以确保传递的风格不会干扰源的行为所传达的意义。我们的方法消除了对样式标签的需要,并允许对训练阶段未见过的样式进行一般化。我们在PATS语料库上训练我们的模型,我们将其扩展到包括对话行为和2D面部标志。客观和主观的评估表明,我们的模型在训练过程中,无论是看到的风格还是看不见的风格,都优于最先进的风格迁移模型。为了解决可能出现的风格和内容泄漏问题,我们提出了一种方法来评估与目标风格相关的行为和手势成功转移的程度,同时确保与源内容相关的行为和手势的保存。
{"title":"TranSTYLer: Multimodal behavioural style transfer for facial and body gestures generation","authors":"Mireille Fares ,&nbsp;Catherine Pelachaud ,&nbsp;Nicolas Obin","doi":"10.1016/j.specom.2025.103286","DOIUrl":"10.1016/j.specom.2025.103286","url":null,"abstract":"<div><div>This paper addresses the challenge of transferring the behaviour expressivity style of a virtual agent to another one while preserving behaviour shape as they carry communicative meaning. Behaviour expressivity style is viewed here as the qualitative properties of behaviours. We propose <em>TranSTYLer</em>, a multimodal transformer-based model that synthesises the multimodal behaviours of a source speaker with the style of a target speaker. We assume that behaviour expressivity style is encoded across various modalities of communication, including text, speech, body gestures, and facial expressions. The model employs a style-content disentanglement schema to ensure that the transferred style does not interfere with the meaning conveyed by the source’s behaviours. Our approach eliminates the need for style labels and allows the generalisation of styles not seen during the training phase. We train our model on the <em>PATS corpus</em>, which we extended to include dialogue acts and 2D facial landmarks. Objective and subjective evaluations show that our model outperforms state-of-the-art models in style transfer for both seen and unseen styles during training. To tackle the issues of style and content leakage that may arise, we propose a methodology to assess the degree to which behaviour and gestures associated with the target style are successfully transferred while ensuring the preservation of the ones related to the source content.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"174 ","pages":"Article 103286"},"PeriodicalIF":3.0,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144893146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing Cancer-Related Cognitive Impairment for breast cancer survivors with speech analysis 用语言分析评估乳腺癌幸存者的癌症相关认知障碍
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-07-29 DOI: 10.1016/j.specom.2025.103284
Amélie B. Richard , Manon Lelandais , Sophie Jacquin-Courtois

Background

This paper presents a first study on the assessment of Cancer-Related Cognitive Impairment (CRCI) with speech analysis. CRCI is a reported change in cognitive performance by cancer patients, which is too subtle to be detected by the current tests in use. Speech analysis might provide a solution to this issue. Speech analysis can be instrumental in the detection of subtle cognitive impairment, as speech contains fine-grained segmental and suprasegmental parameters that are sensitive to changes in cognition. Pauses have particularly been highlighted as potential behavioral markers of neurocognitive disorders. However, the absence of simple, detailed methods compromises the feasibility of speech analyses in clinical practice.

Objectives

This study aims (i) to identify breast cancer survivors with CRCI using a new practical method centered on speech pauses, and (ii) to provide the detailed protocol that supports this study, which is intended to be applicable in clinical contexts.

Methods

Thirty-three breast cancer survivors with a cognitive complaint, eleven breast cancer survivors without a cognitive complaint and thirteen controls were included in the study. Participants were instructed to tell a picture-based story. Their narratives were recorded, automatically transcribed with Whisper, and analyzed using the SPPAS and Praat software. Silent pauses, filled pauses, and sustained vowels were annotated, then processed in RStudio for statistical analysis.

Results

Only silent pause duration was significantly longer for breast cancer survivors with a cognitive complaint than for controls.

Conclusions

The results suggest that silent pause duration is a good marker for detecting CRCI. Automatizing the transcription and annotation of speech data improves the feasibility of speech analysis in clinical contexts, although a manual check is required.
本文首次报道了用语音分析评估癌症相关认知障碍(CRCI)的研究。据报道,CRCI是癌症患者认知表现的一种变化,这种变化太微妙,目前使用的测试无法检测到。语音分析可能会为这个问题提供一个解决方案。语音分析可以用于检测细微的认知障碍,因为语音包含对认知变化敏感的细粒度片段和超片段参数。停顿被特别强调为神经认知障碍的潜在行为标志。然而,缺乏简单、详细的方法影响了临床实践中言语分析的可行性。本研究旨在(i)使用一种以言语停顿为中心的新的实用方法来识别患有CRCI的乳腺癌幸存者,以及(ii)提供支持本研究的详细方案,旨在应用于临床环境。方法将33例有认知障碍的乳腺癌幸存者、11例无认知障碍的乳腺癌幸存者和13例对照组纳入研究。参与者被要求讲一个基于图片的故事。他们的叙述被记录下来,用Whisper自动转录,并使用SPPAS和Praat软件进行分析。对沉默停顿、填充停顿和持续元音进行注释,然后在RStudio中进行处理以进行统计分析。结果只有有认知障碍的乳腺癌幸存者的沉默停顿时间明显长于对照组。结论沉默停顿时间是检测CRCI的良好指标。语音数据的自动化转录和注释提高了临床环境中语音分析的可行性,尽管需要人工检查。
{"title":"Assessing Cancer-Related Cognitive Impairment for breast cancer survivors with speech analysis","authors":"Amélie B. Richard ,&nbsp;Manon Lelandais ,&nbsp;Sophie Jacquin-Courtois","doi":"10.1016/j.specom.2025.103284","DOIUrl":"10.1016/j.specom.2025.103284","url":null,"abstract":"<div><h3>Background</h3><div>This paper presents a first study on the assessment of Cancer-Related Cognitive Impairment (CRCI) with speech analysis. CRCI is a reported change in cognitive performance by cancer patients, which is too subtle to be detected by the current tests in use. Speech analysis might provide a solution to this issue. Speech analysis can be instrumental in the detection of subtle cognitive impairment, as speech contains fine-grained segmental and suprasegmental parameters that are sensitive to changes in cognition. Pauses have particularly been highlighted as potential behavioral markers of neurocognitive disorders. However, the absence of simple, detailed methods compromises the feasibility of speech analyses in clinical practice.</div></div><div><h3>Objectives</h3><div>This study aims (i) to identify breast cancer survivors with CRCI using a new practical method centered on speech pauses, and (ii) to provide the detailed protocol that supports this study, which is intended to be applicable in clinical contexts.</div></div><div><h3>Methods</h3><div>Thirty-three breast cancer survivors with a cognitive complaint, eleven breast cancer survivors without a cognitive complaint and thirteen controls were included in the study. Participants were instructed to tell a picture-based story. Their narratives were recorded, automatically transcribed with Whisper, and analyzed using the SPPAS and Praat software. Silent pauses, filled pauses, and sustained vowels were annotated, then processed in RStudio for statistical analysis.</div></div><div><h3>Results</h3><div>Only silent pause duration was significantly longer for breast cancer survivors with a cognitive complaint than for controls.</div></div><div><h3>Conclusions</h3><div>The results suggest that silent pause duration is a good marker for detecting CRCI. Automatizing the transcription and annotation of speech data improves the feasibility of speech analysis in clinical contexts, although a manual check is required.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"174 ","pages":"Article 103284"},"PeriodicalIF":3.0,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144772420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A cross-modal attention model with contextual enhancements for speech emotion recognition 基于上下文增强的语音情感识别跨模态注意模型
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-07-25 DOI: 10.1016/j.specom.2025.103285
Ruihua Qi , Chen Zhao , Xu Guo , Zhengguang Li , Shaohua Li , Heng Chen , Yunhao Sun
Speech emotion recognition remains challenging due to difficulties in modeling contextual dynamics and aligning asynchronous multimodal data. In this paper, we propose a context-enhanced cross-modal attention model, which integrates asynchronous speech-text alignment with hybrid fusion strategies. By combining feature-level and model-level fusion strategies, incorporating surrounding contextual information, and optimizing the cross-modal attention mechanism through asynchronous alignment of multimodal signals weight refinement of feature mappings, the proposed model improves the ability to focus on critical emotional cues during training and achieve more robust contextual representations of both speech and text. Experiments on the IEMOCAP and MSP-IMPROV datasets achieve state-of-the-art results. The proposed model’s robustness is further validated using ASR-generated inputs.
由于上下文动态建模和异步多模态数据对齐困难,语音情感识别仍然具有挑战性。本文提出了一种上下文增强的跨模态注意模型,该模型将异步语音-文本对齐与混合融合策略相结合。通过结合特征级和模型级融合策略,结合周围的上下文信息,并通过特征映射的多模态信号的异步对齐权重细化来优化跨模态注意机制,该模型提高了在训练过程中关注关键情感线索的能力,并实现了语音和文本的更鲁棒的上下文表示。在IEMOCAP和MSP-IMPROV数据集上的实验取得了最先进的结果。使用asr生成的输入进一步验证了所提出模型的鲁棒性。
{"title":"A cross-modal attention model with contextual enhancements for speech emotion recognition","authors":"Ruihua Qi ,&nbsp;Chen Zhao ,&nbsp;Xu Guo ,&nbsp;Zhengguang Li ,&nbsp;Shaohua Li ,&nbsp;Heng Chen ,&nbsp;Yunhao Sun","doi":"10.1016/j.specom.2025.103285","DOIUrl":"10.1016/j.specom.2025.103285","url":null,"abstract":"<div><div>Speech emotion recognition remains challenging due to difficulties in modeling contextual dynamics and aligning asynchronous multimodal data. In this paper, we propose a context-enhanced cross-modal attention model, which integrates asynchronous speech-text alignment with hybrid fusion strategies. By combining feature-level and model-level fusion strategies, incorporating surrounding contextual information, and optimizing the cross-modal attention mechanism through asynchronous alignment of multimodal signals weight refinement of feature mappings, the proposed model improves the ability to focus on critical emotional cues during training and achieve more robust contextual representations of both speech and text. Experiments on the IEMOCAP and MSP-IMPROV datasets achieve state-of-the-art results. The proposed model’s robustness is further validated using ASR-generated inputs.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"173 ","pages":"Article 103285"},"PeriodicalIF":3.0,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144738146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The effect of female and male stimuli on the evaluation of objective measures for noisy speech 女性和男性刺激对嘈杂言语客观测量评价的影响
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-07-22 DOI: 10.1016/j.specom.2025.103282
Siow Yong Low , He Qi , Cedric Ka-Fai Yiu
Objective measures such as the perceptual evaluation of speech quality (PESQ) and the short-time objective intelligibility measure (STOI) have been widely used to evaluate the efficacy of a speech enhancement processor. Typically, the gender of the speech dataset used for these evaluations is not stated, as the speaker’s gender is assumed to have no effect on the results. However, many studies on speech have shown that speaker’s gender influences overall speech quality and intelligibility, particularly in noisy environments. Despite this, there has been no investigation into the effects of gender on objective speech measures. This paper aims to fill this research gap by investigating the effect of speaker’s gender on the outcome of objective speech measures for speech enhancement applications. If objective measures such as PESQ and STOI are indeed biased towards a particular gender, then the absolute evaluation of a speech processor could potentially skew to a specific speech dataset. The results show that, in general, male speech consistently achieves a higher score in both PESQ and STOI measures compared to female speech in noisy environments. It was also found that the effect is more pronounced under poorer signal to noise ratio conditions. These results underscore the importance of speech diversity, i.e., to include both female and male speech in datasets for future objective evaluations to ensure truly non-biased assessments.
语音质量的感知评价(PESQ)和短时客观可理解度度量(STOI)等客观度量已被广泛用于评价语音增强处理器的效果。通常,没有说明用于这些评估的语音数据集的性别,因为假设说话者的性别对结果没有影响。然而,许多关于语音的研究表明,说话者的性别会影响整体的语音质量和可理解性,特别是在嘈杂的环境中。尽管如此,目前还没有关于性别对客观言语测量的影响的调查。本文旨在通过研究说话者性别对语音增强应用中客观语音测量结果的影响来填补这一研究空白。如果PESQ和STOI等客观指标确实偏向于特定的性别,那么对语音处理器的绝对评估可能会偏向于特定的语音数据集。结果表明,总体而言,在嘈杂环境中,男性语音在PESQ和STOI测试中都比女性语音得分更高。在较差的信噪比条件下,效果更为明显。这些结果强调了语音多样性的重要性,即在数据集中包括女性和男性语音,以便将来进行客观评估,以确保真正无偏见的评估。
{"title":"The effect of female and male stimuli on the evaluation of objective measures for noisy speech","authors":"Siow Yong Low ,&nbsp;He Qi ,&nbsp;Cedric Ka-Fai Yiu","doi":"10.1016/j.specom.2025.103282","DOIUrl":"10.1016/j.specom.2025.103282","url":null,"abstract":"<div><div>Objective measures such as the perceptual evaluation of speech quality (PESQ) and the short-time objective intelligibility measure (STOI) have been widely used to evaluate the efficacy of a speech enhancement processor. Typically, the gender of the speech dataset used for these evaluations is not stated, as the speaker’s gender is assumed to have no effect on the results. However, many studies on speech have shown that speaker’s gender influences overall speech quality and intelligibility, particularly in noisy environments. Despite this, there has been no investigation into the effects of gender on objective speech measures. This paper aims to fill this research gap by investigating the effect of speaker’s gender on the outcome of objective speech measures for speech enhancement applications. If objective measures such as PESQ and STOI are indeed biased towards a particular gender, then the absolute evaluation of a speech processor could potentially skew to a specific speech dataset. The results show that, in general, male speech consistently achieves a higher score in both PESQ and STOI measures compared to female speech in noisy environments. It was also found that the effect is more pronounced under poorer signal to noise ratio conditions. These results underscore the importance of speech diversity, i.e., to include both female and male speech in datasets for future objective evaluations to ensure truly non-biased assessments.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"173 ","pages":"Article 103282"},"PeriodicalIF":3.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144766839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards robust heart failure detection in digital telephony environments by utilizing transformer-based codec inversion 利用基于变压器的编解码器反转在数字电话环境中实现鲁棒心力衰竭检测
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-07-15 DOI: 10.1016/j.specom.2025.103279
Saska Tirronen , Farhad Javanmardi , Hilla Pohjalainen , Sudarsana Reddy Kadiri , Kiran Reddy Mittapalle , Pyry Helkkula , Kasimir Kaitue , Mikko Minkkinen , Heli Tolppanen , Tuomo Nieminen , Paavo Alku
This study introduces the Codec Transformer Network (CTN) to enhance the reliability of automatic heart failure (HF) detection from coded telephone speech by addressing codec-related challenges in digital telephony. The study specifically addresses the codec mismatch between training and inference in HF detection. CTN is designed to map the mel-spectrogram representations of encoded speech signals back to their original, non-encoded forms, thereby recovering HF-related discriminative information. The effectiveness of CTN is demonstrated in conjunction with three HF detectors, based on Support Vector Machine, Random Forest, and K-Nearest Neighbors classifiers. The results show that CTN effectively retrieves the discriminative information between patients and controls, and performs comparably to or better than a baseline approach, based on multi-condition training.
本研究引入了编解码器变压器网络(CTN),通过解决数字电话中与编解码器相关的挑战,来提高从编码电话语音中自动检测心力衰竭(HF)的可靠性。该研究特别解决了高频检测中训练和推理之间的编解码器不匹配问题。CTN旨在将编码语音信号的梅尔谱图表示映射回其原始的非编码形式,从而恢复hf相关的判别信息。CTN的有效性与基于支持向量机、随机森林和k近邻分类器的三种高频检测器相结合。结果表明,CTN有效地检索了患者和对照组之间的判别信息,并且基于多条件训练的CTN方法的性能与基线方法相当或更好。
{"title":"Towards robust heart failure detection in digital telephony environments by utilizing transformer-based codec inversion","authors":"Saska Tirronen ,&nbsp;Farhad Javanmardi ,&nbsp;Hilla Pohjalainen ,&nbsp;Sudarsana Reddy Kadiri ,&nbsp;Kiran Reddy Mittapalle ,&nbsp;Pyry Helkkula ,&nbsp;Kasimir Kaitue ,&nbsp;Mikko Minkkinen ,&nbsp;Heli Tolppanen ,&nbsp;Tuomo Nieminen ,&nbsp;Paavo Alku","doi":"10.1016/j.specom.2025.103279","DOIUrl":"10.1016/j.specom.2025.103279","url":null,"abstract":"<div><div>This study introduces the Codec Transformer Network (CTN) to enhance the reliability of automatic heart failure (HF) detection from coded telephone speech by addressing codec-related challenges in digital telephony. The study specifically addresses the codec mismatch between training and inference in HF detection. CTN is designed to map the mel-spectrogram representations of encoded speech signals back to their original, non-encoded forms, thereby recovering HF-related discriminative information. The effectiveness of CTN is demonstrated in conjunction with three HF detectors, based on Support Vector Machine, Random Forest, and K-Nearest Neighbors classifiers. The results show that CTN effectively retrieves the discriminative information between patients and controls, and performs comparably to or better than a baseline approach, based on multi-condition training.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"173 ","pages":"Article 103279"},"PeriodicalIF":2.4,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144678900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Speech Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1