首页 > 最新文献

Speech Communication最新文献

英文 中文
A model of early word acquisition based on realistic-scale audiovisual naming events 基于现实规模视听命名事件的早期词语习得模型
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-02-01 DOI: 10.1016/j.specom.2024.103169
Khazar Khorrami, Okko Räsänen
Infants gradually learn to parse continuous speech into words and connect names with objects, yet the mechanisms behind development of early word perception skills remain unknown. We studied the extent to which early words can be acquired through statistical learning from regularities in audiovisual sensory input. We simulated word learning in infants up to 12 months of age in a realistic setting, using a model that solely learns from statistical regularities in unannotated raw speech and pixel-level visual input. Crucially, the quantity of object naming events was carefully designed to match that accessible to infants of comparable ages. Results show that the model effectively learns to recognize words and associate them with corresponding visual objects, with a vocabulary growth rate comparable to that observed in infants. The findings support the viability of general statistical learning for early word perception, demonstrating how learning can operate without assuming any prior linguistic capabilities.
婴儿逐渐学会将连续的言语解析成单词,并将名字与物体联系起来,但早期单词感知技能发展背后的机制尚不清楚。我们研究了通过统计学习从视听感官输入的规律中获得早期单词的程度。我们在一个真实的环境中模拟了12个月大的婴儿的单词学习,使用一个模型,该模型仅从未注释的原始语音和像素级视觉输入的统计规律中学习。至关重要的是,物体命名事件的数量经过精心设计,与同龄婴儿的可访问性相匹配。结果表明,该模型有效地学会了识别单词并将其与相应的视觉对象联系起来,词汇量的增长速度与婴儿相当。研究结果支持一般统计学习对早期单词感知的可行性,展示了学习如何在不假设任何先前的语言能力的情况下进行。
{"title":"A model of early word acquisition based on realistic-scale audiovisual naming events","authors":"Khazar Khorrami,&nbsp;Okko Räsänen","doi":"10.1016/j.specom.2024.103169","DOIUrl":"10.1016/j.specom.2024.103169","url":null,"abstract":"<div><div>Infants gradually learn to parse continuous speech into words and connect names with objects, yet the mechanisms behind development of early word perception skills remain unknown. We studied the extent to which early words can be acquired through statistical learning from regularities in audiovisual sensory input. We simulated word learning in infants up to 12 months of age in a realistic setting, using a model that solely learns from statistical regularities in unannotated raw speech and pixel-level visual input. Crucially, the quantity of object naming events was carefully designed to match that accessible to infants of comparable ages. Results show that the model effectively learns to recognize words and associate them with corresponding visual objects, with a vocabulary growth rate comparable to that observed in infants. The findings support the viability of general statistical learning for early word perception, demonstrating how learning can operate without assuming any prior linguistic capabilities.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"167 ","pages":"Article 103169"},"PeriodicalIF":2.4,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143128396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HC-APNet: Harmonic Compensation Auditory Perception Network for low-complexity speech enhancement 用于低复杂度语音增强的谐波补偿听觉感知网络
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-02-01 DOI: 10.1016/j.specom.2024.103161
Nan Li , Meng Ge , Longbiao Wang , Yang-Hao Zhou , Jianwu Dang
Speech enhancement is critical for improving speech quality and intelligibility in a variety of noisy environments. While neural network-based methods have shown promising results in speech enhancement, they often suffer from performance degradation in scenarios with limited computational resources. This paper presents HC-APNet (Harmonic Compensation Auditory Perception Network), a novel lightweight approach tailored to exploit the perceptual capabilities of the human auditory system for efficient and effective speech enhancement, with a focus on harmonic compensation. Inspired by human auditory reception mechanisms, we first segment audio into subbands using an auditory filterbank for speech enhancement. The use of subbands helps to reduce the number of parameters and the computational load, while the use of an auditory filterbank effectively preserves high-quality speech enhancement. In addition, inspired by the perception of human auditory context, we have developed an auditory perception network to capture gain information for different subbands. Furthermore, considering that subband processing only applies gain to the spectral envelope, which may introduce harmonic distortion, we design a learnable multi-subband comb-filter inspired by human pitch frequency perception to mitigate harmonic distortion. Finally, our proposed HC-APNet model achieves competitive performance on the speech quality evaluation metric with significantly less computational and parameter resources compared to existing methods on the VCTK + DEMAND and DNS Challenge datasets.
语音增强是在各种噪声环境中提高语音质量和清晰度的关键。虽然基于神经网络的方法在语音增强方面显示出了很好的结果,但在计算资源有限的情况下,它们往往会受到性能下降的影响。本文介绍了HC-APNet(谐波补偿听觉感知网络),这是一种新颖的轻量级方法,旨在利用人类听觉系统的感知能力来实现高效和有效的语音增强,重点是谐波补偿。受人类听觉接收机制的启发,我们首先使用听觉滤波器组将音频分割成子带进行语音增强。子带的使用有助于减少参数的数量和计算量,而听觉滤波器组的使用有效地保持了高质量的语音增强。此外,受人类听觉环境感知的启发,我们开发了一个听觉感知网络来捕获不同子带的增益信息。此外,考虑到子带处理只对频谱包络产生增益,这可能会导致谐波失真,我们设计了一个受人类基音频率感知启发的可学习的多子带梳状滤波器来减轻谐波失真。最后,与VCTK + DEMAND和DNS Challenge数据集上的现有方法相比,我们提出的HC-APNet模型在语音质量评估指标上取得了具有竞争力的性能,计算和参数资源显著减少。
{"title":"HC-APNet: Harmonic Compensation Auditory Perception Network for low-complexity speech enhancement","authors":"Nan Li ,&nbsp;Meng Ge ,&nbsp;Longbiao Wang ,&nbsp;Yang-Hao Zhou ,&nbsp;Jianwu Dang","doi":"10.1016/j.specom.2024.103161","DOIUrl":"10.1016/j.specom.2024.103161","url":null,"abstract":"<div><div>Speech enhancement is critical for improving speech quality and intelligibility in a variety of noisy environments. While neural network-based methods have shown promising results in speech enhancement, they often suffer from performance degradation in scenarios with limited computational resources. This paper presents HC-APNet (Harmonic Compensation Auditory Perception Network), a novel lightweight approach tailored to exploit the perceptual capabilities of the human auditory system for efficient and effective speech enhancement, with a focus on harmonic compensation. Inspired by human auditory reception mechanisms, we first segment audio into subbands using an auditory filterbank for speech enhancement. The use of subbands helps to reduce the number of parameters and the computational load, while the use of an auditory filterbank effectively preserves high-quality speech enhancement. In addition, inspired by the perception of human auditory context, we have developed an auditory perception network to capture gain information for different subbands. Furthermore, considering that subband processing only applies gain to the spectral envelope, which may introduce harmonic distortion, we design a learnable multi-subband comb-filter inspired by human pitch frequency perception to mitigate harmonic distortion. Finally, our proposed HC-APNet model achieves competitive performance on the speech quality evaluation metric with significantly less computational and parameter resources compared to existing methods on the VCTK + DEMAND and DNS Challenge datasets.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"167 ","pages":"Article 103161"},"PeriodicalIF":2.4,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143128498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effects of voice onset time and place of articulation on perception of dichotic Turkish syllables 发声时间和发音地点对土耳其二元音节知觉的影响
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-02-01 DOI: 10.1016/j.specom.2024.103170
Emre Eskicioglu , Serhat Taslica , Cagdas Guducu , Adile Oniz , Murat Ozgoren
Dichotic listening has been widely used in research investigating the hemispheric specialization of language. A common finding is the Right-ear Advantage (REA), reflecting left hemisphere speech sound perception specialization. However, acoustic/phonetic features of the stimuli, such as voice onset time (VOT) and place of articulation (POA), are known to affect the REA. This study investigates the effects of these features on the REA in the Turkish language, whose language family differs from the languages typically used in previous VOT and POA studies. Data of 95 right-handed participants with REA, which was defined as reporting at least one more correct right than left ear response, were analyzed. Prevoiced consonants were dominant compared with consonants with long VOT and resulted in increased REA. Velar consonants were dominant compared with other consonants. Velar and alveolar consonants resulted in higher REA than bilabial consonants. Lateralization and error rates were lower when POA, but not VOT, of the consonants differed. Error responses were mostly determined by the VOT feature of the consonant presented to the right ear. To conclude, the effects of VOT and PoA on the hemispheric asymmetry in Turkish have been spotted by a behavioral approach. Further neuroimaging or electrophysiologic investigations are needed to validate and shed light into the underlying mechanisms of VOT and PoA effects during the DL test.
二分听法在语言半球特化研究中得到了广泛的应用。一个常见的发现是右耳优势(REA),反映了左半球语音感知专业化。然而,刺激的声学/语音特征,如声音开始时间(VOT)和发音位置(POA),已知会影响REA。本研究探讨了这些特征对土耳其语的REA的影响,土耳其语的语系不同于以往VOT和POA研究中典型使用的语言。我们分析了95名右撇子REA参与者的数据,REA被定义为右耳反应比左耳反应至少多一个正确。与长VOT辅音相比,前置辅音占主导地位,导致REA增加。与其他辅音相比,Velar辅音占主导地位。声母辅音和肺泡辅音的REA高于双声母辅音。当辅音的POA不同,而VOT不同时,偏侧和错误率降低。错误反应主要由呈现给右耳的辅音的VOT特征决定。综上所述,VOT和PoA对土耳其语半球不对称的影响已经被行为方法发现。需要进一步的神经影像学或电生理学研究来验证和阐明DL测试中VOT和PoA效应的潜在机制。
{"title":"Effects of voice onset time and place of articulation on perception of dichotic Turkish syllables","authors":"Emre Eskicioglu ,&nbsp;Serhat Taslica ,&nbsp;Cagdas Guducu ,&nbsp;Adile Oniz ,&nbsp;Murat Ozgoren","doi":"10.1016/j.specom.2024.103170","DOIUrl":"10.1016/j.specom.2024.103170","url":null,"abstract":"<div><div>Dichotic listening has been widely used in research investigating the hemispheric specialization of language. A common finding is the Right-ear Advantage (REA), reflecting left hemisphere speech sound perception specialization. However, acoustic/phonetic features of the stimuli, such as voice onset time (VOT) and place of articulation (POA), are known to affect the REA. This study investigates the effects of these features on the REA in the Turkish language, whose language family differs from the languages typically used in previous VOT and POA studies. Data of 95 right-handed participants with REA, which was defined as reporting at least one more correct right than left ear response, were analyzed. Prevoiced consonants were dominant compared with consonants with long VOT and resulted in increased REA. Velar consonants were dominant compared with other consonants. Velar and alveolar consonants resulted in higher REA than bilabial consonants. Lateralization and error rates were lower when POA, but not VOT, of the consonants differed. Error responses were mostly determined by the VOT feature of the consonant presented to the right ear. To conclude, the effects of VOT and PoA on the hemispheric asymmetry in Turkish have been spotted by a behavioral approach. Further neuroimaging or electrophysiologic investigations are needed to validate and shed light into the underlying mechanisms of VOT and PoA effects during the DL test.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"167 ","pages":"Article 103170"},"PeriodicalIF":2.4,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143128502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spoken language identification: An overview of past and present research trends 口语识别:过去和现在的研究趋势概述
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-02-01 DOI: 10.1016/j.specom.2024.103167
Douglas O'Shaughnessy
Identification of the language used in spoken utterances is useful for multiple applications, e.g., assist in directing or automating telephone calls, or selecting which language-specific speech recognizer to use. This paper reviews modern methods of automatic language identification. It examines what information in speech helps to distinguish among languages, and extends these ideas to dialect estimation as well. As approaches to recognize languages often share much in common with both automatic speech recognition and speaker verification, these three processes are compared. Many methods are drawn from pattern recognition research in other areas, such as image and text recognition. This paper notes how speech is different from most other signals to recognize, and how language identification differs from other speech applications. While it is mainly addressed to readers who are not experts in speech processing (as detailed algorithms, readily found in the cited literature, are omitted here), the presentation covers a wide discussion useful to experts too.
识别语音中使用的语言对多种应用都是有用的,例如,协助指挥或自动电话呼叫,或选择使用哪种特定语言的语音识别器。本文综述了现代自动语言识别方法。它研究了语音中的哪些信息有助于区分语言,并将这些想法扩展到方言估计。由于语言识别方法通常与自动语音识别和说话人验证有许多共同之处,因此对这三种过程进行了比较。许多方法借鉴了其他领域的模式识别研究,如图像和文本识别。本文指出了语音识别与大多数其他识别信号的区别,以及语言识别与其他语音应用的区别。虽然它主要是针对那些不是语音处理专家的读者(在引用的文献中很容易找到的详细算法在这里被省略),但该演示也涵盖了对专家有用的广泛讨论。
{"title":"Spoken language identification: An overview of past and present research trends","authors":"Douglas O'Shaughnessy","doi":"10.1016/j.specom.2024.103167","DOIUrl":"10.1016/j.specom.2024.103167","url":null,"abstract":"<div><div>Identification of the language used in spoken utterances is useful for multiple applications, e.g., assist in directing or automating telephone calls, or selecting which language-specific speech recognizer to use. This paper reviews modern methods of automatic language identification. It examines what information in speech helps to distinguish among languages, and extends these ideas to dialect estimation as well. As approaches to recognize languages often share much in common with both automatic speech recognition and speaker verification, these three processes are compared. Many methods are drawn from pattern recognition research in other areas, such as image and text recognition. This paper notes how speech is different from most other signals to recognize, and how language identification differs from other speech applications. While it is mainly addressed to readers who are not experts in speech processing (as detailed algorithms, readily found in the cited literature, are omitted here), the presentation covers a wide discussion useful to experts too.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"167 ","pages":"Article 103167"},"PeriodicalIF":2.4,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143128499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Systematic review: The identification of segmental Mandarin-accented English features 系统综述:汉语普通话口音分词特征的识别
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-02-01 DOI: 10.1016/j.specom.2024.103168
Hongzhi Wang, Rachael-Anne Knight, Lucy Dipper, Roy Alderton, Reem S․ W․ Alyahya

Background

The pronunciation of L2 English by L1 Mandarin speakers is influenced by transfer effects from the phonology of Mandarin. However, there is a research gap in systematically synthesizing and reviewing segmental Mandarin-accented English features (SMAEFs) from the existing literature. An accurate and comprehensive description of SMAEFs is necessary for applied science in relevant fields.

Aim

To identify the segmental features that are most consistently described as characteristic of Mandarin-accented English in previous literature.

Methods

A systematic review was conducted. The studies were identified through searching in nine databases with eight screening criteria.

Results

The systematic review includes nineteen studies with a total of 1,873 Mandarin English speakers. The included studies yield 45 SMAEFs, classified into Vowel and Consonant categories, under which there are multiple sub-categories. The results are supported by evidence of different levels of strength. The four frequently reported findings, which are 1) variations in vowel height and frontness, 2) schwa epenthesis, 3) variations in closure duration in plosives and 4) illegal consonant deletion, were identified and analyzed in terms of their potential intelligibility outcomes.

Conclusion

The number of SMAEFs is large. These features occur in numerous traditional phonetic categories and two categories (i.e. schwa epenthesis and illegal consonant deletion) that are typically used to describe features in connected speech. The study outcomes may provide valuable insights for researchers and practitioners in the fields of English Language Teaching, phonetics, and speech recognition system development in terms of selecting the pronunciation features to focus on in teaching and research or supporting the successful identification of accented features.
母语普通话使用者的第二语言英语发音受到普通话语音迁移效应的影响。然而,从现有文献中系统地综合和回顾汉语重音分词特征,还存在研究空白。准确、全面地描述中小微环境对相关领域的应用科学研究是必要的。目的找出在以往文献中最一致地被描述为普通话口音英语特征的分词特征。方法进行系统评价。这些研究通过9个数据库的8个筛选标准进行筛选。结果本系统综述包括19项研究,共1873名普通话英语使用者。纳入的研究产生了45个smaef,分为元音和辅音类别,其中有多个子类别。研究结果得到了不同强度证据的支持。本文对四种常见的发现进行了鉴定和分析,即1)元音高度和正面的变化,2)弱读元音的扩大,3)爆破音闭合时间的变化和4)非法辅音的缺失。结论中小创企业数量较多。这些特征出现在许多传统的语音类别和两类(即弱读元音增音和非法辅音删除)中,这两类通常用于描述连接语音的特征。研究结果可为英语教学、语音学和语音识别系统开发等领域的研究人员和实践者在教学和研究中选择重点关注的语音特征或支持重音特征的成功识别提供有价值的见解。
{"title":"Systematic review: The identification of segmental Mandarin-accented English features","authors":"Hongzhi Wang,&nbsp;Rachael-Anne Knight,&nbsp;Lucy Dipper,&nbsp;Roy Alderton,&nbsp;Reem S․ W․ Alyahya","doi":"10.1016/j.specom.2024.103168","DOIUrl":"10.1016/j.specom.2024.103168","url":null,"abstract":"<div><h3>Background</h3><div>The pronunciation of L2 English by L1 Mandarin speakers is influenced by transfer effects from the phonology of Mandarin. However, there is a research gap in systematically synthesizing and reviewing segmental Mandarin-accented English features (SMAEFs) from the existing literature. An accurate and comprehensive description of SMAEFs is necessary for applied science in relevant fields.</div></div><div><h3>Aim</h3><div>To identify the segmental features that are most consistently described as characteristic of Mandarin-accented English in previous literature.</div></div><div><h3>Methods</h3><div>A systematic review was conducted. The studies were identified through searching in nine databases with eight screening criteria.</div></div><div><h3>Results</h3><div>The systematic review includes nineteen studies with a total of 1,873 Mandarin English speakers. The included studies yield 45 SMAEFs, classified into Vowel and Consonant categories, under which there are multiple sub-categories. The results are supported by evidence of different levels of strength. The four frequently reported findings, which are 1) variations in vowel height and frontness, 2) schwa epenthesis, 3) variations in closure duration in plosives and 4) illegal consonant deletion, were identified and analyzed in terms of their potential intelligibility outcomes.</div></div><div><h3>Conclusion</h3><div>The number of SMAEFs is large. These features occur in numerous traditional phonetic categories and two categories (i.e. schwa epenthesis and illegal consonant deletion) that are typically used to describe features in connected speech. The study outcomes may provide valuable insights for researchers and practitioners in the fields of English Language Teaching, phonetics, and speech recognition system development in terms of selecting the pronunciation features to focus on in teaching and research or supporting the successful identification of accented features.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"167 ","pages":"Article 103168"},"PeriodicalIF":2.4,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-distillation-based domain exploration for source speaker verification under spoofed speech from unknown voice conversion 基于自蒸馏的未知语音转换欺骗下的源说话人验证域探索
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-02-01 DOI: 10.1016/j.specom.2024.103153
Xinlei Ma , Ruiteng Zhang , Jianguo Wei , Xugang Lu , Junhai Xu , Lin Zhang , Wenhuan Lu
Advancements in voice conversion (VC) technology have made it easier to generate spoofed speech that closely resembles the identity of a target speaker. Meanwhile, verification systems within the realm of speech processing are widely used to identify speakers. However, the misuse of VC algorithms poses significant privacy and security risks by potentially deceiving these systems. To address this issue, source speaker verification (SSV) has been proposed to verify the source speaker’s identity of the spoofed speech generated by VCs. Nevertheless, SSV often suffers severe performance degradation when confronted with unknown VC algorithms, which is usually neglected by researchers. To deal with this cross-voice-conversion scenario and enhance the model’s performance when facing unknown VC methods, we redefine it as a novel domain adaptation task by treating each VC method as a distinct domain. In this context, we propose an unsupervised domain adaptation (UDA) algorithm termed self-distillation-based domain exploration (SDDE). This algorithm adopts a siamese framework with two branches: one trained on the source (known) domain and the other trained on the target domains (unknown VC methods). The branch trained on the source domain leverages supervised learning to capture the source speaker’s intrinsic features. Meanwhile, the branch trained on the target domain employs self-distillation to explore target domain information from multi-scale segments. Additionally, we have constructed a large-scale data set comprising over 7945 h of spoofed speech to evaluate the proposed SDDE. Experimental results on this data set demonstrate that SDDE outperforms traditional UDAs and substantially enhances the performance of the SSV model under unknown VC scenarios. The code for data generation and the trial lists are available at https://github.com/zrtlemontree/cross-domain-source-speaker-verification.
语音转换(VC)技术的进步使得生成与目标说话者的身份非常相似的欺骗语音变得更加容易。同时,语音处理领域的验证系统被广泛用于识别说话人。然而,VC算法的滥用可能会欺骗这些系统,从而带来重大的隐私和安全风险。为了解决这个问题,提出了源说话人验证(SSV)来验证vc生成的欺骗语音的源说话人身份。然而,当面对未知的VC算法时,SSV往往会遭受严重的性能下降,而这一点通常被研究人员所忽视。为了处理这种跨语音转换场景并提高模型在面对未知VC方法时的性能,我们将每个VC方法视为一个不同的域,将其重新定义为一种新的域自适应任务。在此背景下,我们提出了一种无监督域自适应(UDA)算法,称为基于自蒸馏的域探索(SDDE)。该算法采用两个分支的siamese框架,一个分支在源域(已知)上训练,另一个分支在目标域(未知VC方法)上训练。在源域上训练的分支利用监督学习来捕捉源说话人的内在特征。同时,在目标域上训练的分支采用自蒸馏的方法从多尺度片段中挖掘目标域信息。此外,我们构建了一个包含超过7945小时的欺骗语音的大型数据集来评估所提出的SDDE。在该数据集上的实验结果表明,SDDE优于传统的UDAs,并且在未知VC场景下显著提高了SSV模型的性能。数据生成代码和试用列表可从https://github.com/zrtlemontree/cross-domain-source-speaker-verification获得。
{"title":"Self-distillation-based domain exploration for source speaker verification under spoofed speech from unknown voice conversion","authors":"Xinlei Ma ,&nbsp;Ruiteng Zhang ,&nbsp;Jianguo Wei ,&nbsp;Xugang Lu ,&nbsp;Junhai Xu ,&nbsp;Lin Zhang ,&nbsp;Wenhuan Lu","doi":"10.1016/j.specom.2024.103153","DOIUrl":"10.1016/j.specom.2024.103153","url":null,"abstract":"<div><div>Advancements in voice conversion (VC) technology have made it easier to generate spoofed speech that closely resembles the identity of a target speaker. Meanwhile, verification systems within the realm of speech processing are widely used to identify speakers. However, the misuse of VC algorithms poses significant privacy and security risks by potentially deceiving these systems. To address this issue, source speaker verification (SSV) has been proposed to verify the source speaker’s identity of the spoofed speech generated by VCs. Nevertheless, SSV often suffers severe performance degradation when confronted with unknown VC algorithms, which is usually neglected by researchers. To deal with this cross-voice-conversion scenario and enhance the model’s performance when facing unknown VC methods, we redefine it as a novel domain adaptation task by treating each VC method as a distinct domain. In this context, we propose an unsupervised domain adaptation (UDA) algorithm termed self-distillation-based domain exploration (SDDE). This algorithm adopts a siamese framework with two branches: one trained on the source (known) domain and the other trained on the target domains (unknown VC methods). The branch trained on the source domain leverages supervised learning to capture the source speaker’s intrinsic features. Meanwhile, the branch trained on the target domain employs self-distillation to explore target domain information from multi-scale segments. Additionally, we have constructed a large-scale data set comprising over 7945 h of spoofed speech to evaluate the proposed SDDE. Experimental results on this data set demonstrate that SDDE outperforms traditional UDAs and substantially enhances the performance of the SSV model under unknown VC scenarios. The code for data generation and the trial lists are available at <span><span>https://github.com/zrtlemontree/cross-domain-source-speaker-verification</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"167 ","pages":"Article 103153"},"PeriodicalIF":2.4,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143128492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved AED with multi-stage feature extraction and fusion based on RFAConv and PSA 基于RFAConv和PSA的多阶段特征提取与融合改进AED
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-02-01 DOI: 10.1016/j.specom.2024.103166
Bingbing Wang, Yangjie Wei, Zhuangzhuang Wang, Zekang Qi
End-to-end speech recognition systems based on the Attention-based Encoder-Decoder (AED) model normally achieve high accuracy because they concurrently consider the previously generated tokens and contextual features of speech signals. However, the spatial, positional, and multiscale information during shallow feature extraction is mostly neglected, and the shallow and deep features are rarely effectively fused. These problems seriously limit the accuracy and speed of speech recognition in real applications. This study proposes a multi-stage feature extraction and fusion method tailored for end-to-end speech recognition systems based on the AED model. Initially, the receptive-field attention convolutional module is introduced into the front-end feature extraction stage of AED. This module employs a receptive field attention mechanism to enhance the model's feature extraction capability by focusing on the positional and spatial information of speech signals. Moreover, a pyramid squeeze attention mechanism is incorporated into the encoder module to effectively merge the shallow and deep features, and feature maps are recalibrated through weight learning to enhance the accuracy of the encoder's output features. Finally, the effectiveness and robustness of our method are validated across various end-to-end speech recognition models. The experimental results prove that our improved AED speech recognition models with multi-stage feature extraction and fusion achieve a lower word error rate without a language model, and their transcriptions are more accurate and grammatically precise.
基于注意力的编码器-解码器(AED)模型的端到端语音识别系统通常具有较高的准确率,因为它同时考虑了语音信号的先前生成的标记和上下文特征。然而,浅层特征提取过程中往往忽略了空间信息、位置信息和多尺度信息,浅层特征和深层特征难以有效融合。这些问题严重限制了语音识别在实际应用中的准确性和速度。本研究提出了一种基于AED模型的端到端语音识别系统多阶段特征提取与融合方法。首先,在AED的前端特征提取阶段引入了接受场注意卷积模块。该模块采用感受场注意机制,通过关注语音信号的位置和空间信息,增强模型的特征提取能力。此外,在编码器模块中引入了金字塔挤压注意机制,有效地融合了浅层和深层特征,并通过权值学习对特征映射进行重新校准,提高编码器输出特征的精度。最后,在各种端到端语音识别模型中验证了我们方法的有效性和鲁棒性。实验结果表明,采用多阶段特征提取和融合的改进的AED语音识别模型在没有语言模型的情况下实现了较低的单词错误率,并且其转录更加准确和语法精确。
{"title":"Improved AED with multi-stage feature extraction and fusion based on RFAConv and PSA","authors":"Bingbing Wang,&nbsp;Yangjie Wei,&nbsp;Zhuangzhuang Wang,&nbsp;Zekang Qi","doi":"10.1016/j.specom.2024.103166","DOIUrl":"10.1016/j.specom.2024.103166","url":null,"abstract":"<div><div>End-to-end speech recognition systems based on the Attention-based Encoder-Decoder (AED) model normally achieve high accuracy because they concurrently consider the previously generated tokens and contextual features of speech signals. However, the spatial, positional, and multiscale information during shallow feature extraction is mostly neglected, and the shallow and deep features are rarely effectively fused. These problems seriously limit the accuracy and speed of speech recognition in real applications. This study proposes a multi-stage feature extraction and fusion method tailored for end-to-end speech recognition systems based on the AED model. Initially, the receptive-field attention convolutional module is introduced into the front-end feature extraction stage of AED. This module employs a receptive field attention mechanism to enhance the model's feature extraction capability by focusing on the positional and spatial information of speech signals. Moreover, a pyramid squeeze attention mechanism is incorporated into the encoder module to effectively merge the shallow and deep features, and feature maps are recalibrated through weight learning to enhance the accuracy of the encoder's output features. Finally, the effectiveness and robustness of our method are validated across various end-to-end speech recognition models. The experimental results prove that our improved AED speech recognition models with multi-stage feature extraction and fusion achieve a lower word error rate without a language model, and their transcriptions are more accurate and grammatically precise.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"167 ","pages":"Article 103166"},"PeriodicalIF":2.4,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143128501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
One-class network leveraging spectro-temporal features for generalized synthetic speech detection 利用光谱-时间特征的一类网络进行广义合成语音检测
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-01-25 DOI: 10.1016/j.specom.2025.103200
Jiahong Ye , Diqun Yan , Songyin Fu , Bin Ma , Zhihua Xia
Synthetic speech attacks pose significant threats to Automatic Speaker Verification (ASV) systems. To counter these, various detection systems have been developed. However, these models often struggle with reduced accuracy when encountering novel spoofing attacks during testing. To address this issue, this paper proposes a One-Class Network architecture that leverages features extracted from the log power spectrum of the F0 subband. We have developed an advanced spectro-temporal enhancement module, comprising the Temporal Correlation Integrate Module (TCIM) and the Frequency-Adaptive Dependency Module (FADM), to accurately capture F0 subband details. TCIM captures crucial temporal dynamics and models the long-term dependencies characteristic of the F0 signals within the F0 subband. Meanwhile, FADM employs a frequency-adaptive mechanism to identify critical frequency bands, allowing the detection system to conduct a thorough and detailed analysis. Additionally, we introduce a KLOC-Softmax loss function that incorporates the KoLeo regularizer. This function promotes a uniform distribution of features within batches, effectively addressing intra-class imbalance and aiding balanced optimization. Experimental results on the ASVspoof 2019 LA dataset show that our approach achieves an equal error rate (EER) of 0.38% and a minimum tandem detection cost function (min t-DCF) of 0.0127. Our method outperforms most state-of-the-art speech anti-spoofing techniques and demonstrates robust generalizability to previously unseen types of synthetic speech attacks.
合成语音攻击对自动说话人验证(ASV)系统构成了重大威胁。为了对抗这些,已经开发了各种检测系统。然而,在测试过程中遇到新的欺骗攻击时,这些模型的准确性经常会降低。为了解决这个问题,本文提出了一个一类网络架构,利用从F0子带的对数功率谱中提取的特征。我们开发了一种先进的光谱-时间增强模块,包括时间相关积分模块(TCIM)和频率自适应依赖模块(FADM),以准确捕获F0子带细节。TCIM捕获关键的时间动态,并模拟F0子带内F0信号的长期依赖特性。同时,FADM采用频率自适应机制来识别关键频段,使检测系统能够进行彻底和详细的分析。此外,我们还引入了一个包含KoLeo正则化器的KLOC-Softmax损失函数。此函数促进批次内特征的均匀分布,有效地解决类内不平衡并帮助平衡优化。在ASVspoof 2019 LA数据集上的实验结果表明,该方法的等错误率(EER)为0.38%,最小串联检测成本函数(min t-DCF)为0.0127。我们的方法优于大多数最先进的语音反欺骗技术,并展示了对以前未见过的合成语音攻击类型的鲁棒泛化性。
{"title":"One-class network leveraging spectro-temporal features for generalized synthetic speech detection","authors":"Jiahong Ye ,&nbsp;Diqun Yan ,&nbsp;Songyin Fu ,&nbsp;Bin Ma ,&nbsp;Zhihua Xia","doi":"10.1016/j.specom.2025.103200","DOIUrl":"10.1016/j.specom.2025.103200","url":null,"abstract":"<div><div>Synthetic speech attacks pose significant threats to Automatic Speaker Verification (ASV) systems. To counter these, various detection systems have been developed. However, these models often struggle with reduced accuracy when encountering novel spoofing attacks during testing. To address this issue, this paper proposes a One-Class Network architecture that leverages features extracted from the log power spectrum of the F0 subband. We have developed an advanced spectro-temporal enhancement module, comprising the Temporal Correlation Integrate Module (TCIM) and the Frequency-Adaptive Dependency Module (FADM), to accurately capture F0 subband details. TCIM captures crucial temporal dynamics and models the long-term dependencies characteristic of the F0 signals within the F0 subband. Meanwhile, FADM employs a frequency-adaptive mechanism to identify critical frequency bands, allowing the detection system to conduct a thorough and detailed analysis. Additionally, we introduce a KLOC-Softmax loss function that incorporates the KoLeo regularizer. This function promotes a uniform distribution of features within batches, effectively addressing intra-class imbalance and aiding balanced optimization. Experimental results on the ASVspoof 2019 LA dataset show that our approach achieves an equal error rate (EER) of 0.38% and a minimum tandem detection cost function (min t-DCF) of 0.0127. Our method outperforms most state-of-the-art speech anti-spoofing techniques and demonstrates robust generalizability to previously unseen types of synthetic speech attacks.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"169 ","pages":"Article 103200"},"PeriodicalIF":2.4,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143103333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effects of harmonicity on Mandarin speech perception in cochlear implant users 和谐度对人工耳蜗使用者普通话语音感知的影响
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-01-24 DOI: 10.1016/j.specom.2025.103199
Mingyue Shi , Qinglin Meng , Huali Zhou , Jiawen Li , Yefei Mo , Nengheng Zheng
Previous research has demonstrated the negligible impact of harmonicity on English speech perception for normal hearing (NH) listeners in quiet environments. This study aims to bridge the gap in understanding the role of harmonicity in Mandarin speech perception for cochlear implant (CI) users. Speech perception in quiet was tested in both CI simulation group and actual CI user group using harmonic and inharmonic Mandarin speech. Furthermore, speech-on-speech perception was tested in NH, CI simulation, and actual CI user groups. For speech perception in quiet, results show that, compared to harmonic speech, inharmonic speech decreased the mean recognition rate for both actual CI user and CI simulation groups by about 10 percentage points. For speech-on-speech perception, all groups (i.e., NH, CI simulation, and actual CI user) performed worse with inharmonic stimuli compared to harmonic stimuli. The findings of this study, along with previous studies in NH listeners, indicate that harmonicity aids target speech recognition for NH listeners in speech-on-speech conditions but not speech perception in quiet. In contrast, harmonicity plays an important role in CI users’ Mandarin speech recognition in both quiet and speech-on-speech conditions. However, under speech-on-speech conditions, CI users could only understand target speech at positive SNRs (often > 5 dB), suggesting that their performance depends on the intelligibility of the target speech. The contribution of harmonicity to masking release in CI users remains unclear.
先前的研究表明,在安静的环境中,和谐度对正常听力(NH)听众的英语语音感知的影响可以忽略不计。本研究旨在探讨和谐度在人工耳蜗使用者普通话语音感知中的作用。使用谐和和不谐和的普通话语音测试了CI模拟组和实际CI用户组在安静环境下的语音感知。此外,在NH、CI模拟和实际CI用户组中测试了语音对语音感知。对于安静环境下的语音感知,结果表明,与谐波语音相比,非谐波语音使实际CI用户和CI模拟组的平均识别率降低了约10个百分点。对于语音对语音感知,所有组(即NH, CI模拟和实际CI用户)在非谐波刺激下的表现都比谐波刺激差。本研究的结果,以及之前对NH听者的研究表明,和谐有助于NH听者在言语对言语条件下的目标语音识别,但不利于安静情况下的语音感知。相比之下,和谐度在CI用户的普通话语音识别中起着重要的作用,无论是在安静状态还是在语音状态下。然而,在语音对语音条件下,CI用户只能理解目标语音的正信噪比(通常为>;5分贝),这表明他们的表现取决于目标语音的可理解性。在CI用户中,一致性对屏蔽发布的贡献仍然不清楚。
{"title":"Effects of harmonicity on Mandarin speech perception in cochlear implant users","authors":"Mingyue Shi ,&nbsp;Qinglin Meng ,&nbsp;Huali Zhou ,&nbsp;Jiawen Li ,&nbsp;Yefei Mo ,&nbsp;Nengheng Zheng","doi":"10.1016/j.specom.2025.103199","DOIUrl":"10.1016/j.specom.2025.103199","url":null,"abstract":"<div><div>Previous research has demonstrated the negligible impact of harmonicity on English speech perception for normal hearing (NH) listeners in quiet environments. This study aims to bridge the gap in understanding the role of harmonicity in Mandarin speech perception for cochlear implant (CI) users. Speech perception in quiet was tested in both CI simulation group and actual CI user group using harmonic and inharmonic Mandarin speech. Furthermore, speech-on-speech perception was tested in NH, CI simulation, and actual CI user groups. For speech perception in quiet, results show that, compared to harmonic speech, inharmonic speech decreased the mean recognition rate for both actual CI user and CI simulation groups by about 10 percentage points. For speech-on-speech perception, all groups (i.e., NH, CI simulation, and actual CI user) performed worse with inharmonic stimuli compared to harmonic stimuli. The findings of this study, along with previous studies in NH listeners, indicate that harmonicity aids target speech recognition for NH listeners in speech-on-speech conditions but not speech perception in quiet. In contrast, harmonicity plays an important role in CI users’ Mandarin speech recognition in both quiet and speech-on-speech conditions. However, under speech-on-speech conditions, CI users could only understand target speech at positive SNRs (often <span><math><mo>&gt;</mo></math></span> 5 dB), suggesting that their performance depends on the intelligibility of the target speech. The contribution of harmonicity to masking release in CI users remains unclear.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"169 ","pages":"Article 103199"},"PeriodicalIF":2.4,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143103334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coordination Attention based Transformers with bidirectional contrastive loss for multimodal speech emotion recognition 基于双向对比损失的协调注意变形在多模态语音情感识别中的应用
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-01-23 DOI: 10.1016/j.specom.2025.103198
Weiquan Fan , Xiangmin Xu , Guohua Zhou , Xiaofang Deng , Xiaofen Xing
Emotion recognition is crucial to improve the human–computer interaction experience. Attention mechanisms have become a mainstream technique due to their excellent ability to capture emotion representations. Existing algorithms often employ self-attention and cross-attention for multimodal interactions, which artificially set specific attention patterns at specific layers of the model. However, it is uncertain which attention mechanism is more important in different layers of the model. In this paper, we propose a Coordination Attention based Transformers (CAT). Based on the dual attention paradigm, CAT dynamically infers the pass rates of self-attention and cross-attention layer by layer, coordinating the importance of intra-modal and inter-modal factors. Further, we propose a bidirectional contrastive loss to cluster the matching pairs between modalities and push the mismatching pairs farther apart. Experiments demonstrate the effectiveness of our method, and the state-of-the-art performance is achieved under the same experimental conditions.
情感识别是提高人机交互体验的关键。注意机制因其捕捉情绪表征的卓越能力而成为一种主流技术。现有算法通常采用自注意和交叉注意进行多模态交互,人为地在模型的特定层设置特定的注意模式。然而,在模型的不同层次中,哪一种注意机制更重要尚不确定。本文提出了一种基于协调注意的变压器(CAT)。基于双重注意范式,CAT逐层动态推断自注意和交叉注意的通过率,协调了模态内和模态间因素的重要性。此外,我们提出了一个双向对比损失来聚类模态之间的匹配对,并将不匹配对推得更远。实验证明了该方法的有效性,并在相同的实验条件下达到了最先进的性能。
{"title":"Coordination Attention based Transformers with bidirectional contrastive loss for multimodal speech emotion recognition","authors":"Weiquan Fan ,&nbsp;Xiangmin Xu ,&nbsp;Guohua Zhou ,&nbsp;Xiaofang Deng ,&nbsp;Xiaofen Xing","doi":"10.1016/j.specom.2025.103198","DOIUrl":"10.1016/j.specom.2025.103198","url":null,"abstract":"<div><div>Emotion recognition is crucial to improve the human–computer interaction experience. Attention mechanisms have become a mainstream technique due to their excellent ability to capture emotion representations. Existing algorithms often employ self-attention and cross-attention for multimodal interactions, which artificially set specific attention patterns at specific layers of the model. However, it is uncertain which attention mechanism is more important in different layers of the model. In this paper, we propose a Coordination Attention based Transformers (CAT). Based on the dual attention paradigm, CAT dynamically infers the pass rates of self-attention and cross-attention layer by layer, coordinating the importance of intra-modal and inter-modal factors. Further, we propose a bidirectional contrastive loss to cluster the matching pairs between modalities and push the mismatching pairs farther apart. Experiments demonstrate the effectiveness of our method, and the state-of-the-art performance is achieved under the same experimental conditions.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"169 ","pages":"Article 103198"},"PeriodicalIF":2.4,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143164900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Speech Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1