首页 > 最新文献

Speech Communication最新文献

英文 中文
Comparison and analysis of new curriculum criteria for end-to-end ASR 端到端 ASR 新课程标准的比较与分析
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-07-31 DOI: 10.1016/j.specom.2024.103113

Traditionally, teaching a human and a Machine Learning (ML) model is quite different, but organized and structured learning has the ability to enable faster and better understanding of the underlying concepts. For example, when humans learn to speak, they first learn how to utter basic phones and then slowly move towards more complex structures such as words and sentences. Motivated by this observation, researchers have started to adapt this approach for training ML models. Since the main concept, the gradual increase in difficulty, resembles the notion of the curriculum in education, the methodology became known as Curriculum Learning (CL). In this work, we design and test new CL approaches to train Automatic Speech Recognition systems, specifically focusing on the so-called end-to-end models. These models consist of a single, large-scale neural network that performs the recognition task, in contrast to the traditional way of having several specialized components focusing on different subtasks (e.g., acoustic and language modeling). We demonstrate that end-to-end models can achieve better performances if they are provided with an organized training set consisting of examples that exhibit an increasing level of difficulty. To impose structure on the training set and to define the notion of an easy example, we explored multiple solutions that use either external, static scoring methods or incorporate feedback from the model itself. In addition, we examined the effect of pacing functions that control how much data is presented to the network during each training epoch. Our proposed curriculum learning strategies were tested on the task of speech recognition on two data sets, one containing spontaneous Finnish speech where volunteers were asked to speak about a given topic, and one containing planned English speech. Empirical results showed that a good curriculum strategy can yield performance improvements and speed-up convergence. After a given number of epochs, our best strategy achieved a 5.6% and 3.4% decrease in terms of test set word error rate for the Finnish and English data sets, respectively.

传统上,教授人类和教授机器学习(ML)模型是完全不同的,但有组织、有条理的学习能够让人更快、更好地理解基本概念。例如,当人类学习说话时,他们首先学习如何说出基本的电话,然后慢慢转向更复杂的结构,如单词和句子。受此启发,研究人员开始采用这种方法来训练 ML 模型。由于这种方法的主要概念--难度逐渐增加--与教育中的课程概念相似,因此被称为课程学习(CL)。在这项工作中,我们设计并测试了用于训练自动语音识别系统的新的 CL 方法,尤其侧重于所谓的端到端模型。这些模型由执行识别任务的单个大型神经网络组成,而传统的方法是由几个专门的组件负责不同的子任务(如声学和语言建模)。我们证明,如果为端到端模型提供由难度不断增加的示例组成的有组织训练集,它们就能获得更好的性能。为了对训练集进行结构化处理并定义简单示例的概念,我们探索了多种解决方案,既可以使用外部静态评分方法,也可以结合模型本身的反馈。此外,我们还研究了步调函数的效果,该函数可控制在每个训练周期内向网络提供多少数据。我们提出的课程学习策略在两个数据集的语音识别任务中进行了测试,一个数据集包含自发的芬兰语语音,要求志愿者就给定的主题发言;另一个数据集包含计划好的英语语音。实证结果表明,好的课程学习策略可以提高性能,加快收敛速度。经过一定数量的历时后,我们的最佳策略在芬兰语和英语数据集的测试集单词错误率方面分别降低了 5.6% 和 3.4%。
{"title":"Comparison and analysis of new curriculum criteria for end-to-end ASR","authors":"","doi":"10.1016/j.specom.2024.103113","DOIUrl":"10.1016/j.specom.2024.103113","url":null,"abstract":"<div><p>Traditionally, teaching a human and a Machine Learning (ML) model is quite different, but organized and structured learning has the ability to enable faster and better understanding of the underlying concepts. For example, when humans learn to speak, they first learn how to utter basic phones and then slowly move towards more complex structures such as words and sentences. Motivated by this observation, researchers have started to adapt this approach for training ML models. Since the main concept, the gradual increase in difficulty, resembles the notion of the curriculum in education, the methodology became known as Curriculum Learning (CL). In this work, we design and test new CL approaches to train Automatic Speech Recognition systems, specifically focusing on the so-called end-to-end models. These models consist of a single, large-scale neural network that performs the recognition task, in contrast to the traditional way of having several specialized components focusing on different subtasks (e.g., acoustic and language modeling). We demonstrate that end-to-end models can achieve better performances if they are provided with an organized training set consisting of examples that exhibit an increasing level of difficulty. To impose structure on the training set and to define the notion of an easy example, we explored multiple solutions that use either external, static scoring methods or incorporate feedback from the model itself. In addition, we examined the effect of pacing functions that control how much data is presented to the network during each training epoch. Our proposed curriculum learning strategies were tested on the task of speech recognition on two data sets, one containing spontaneous Finnish speech where volunteers were asked to speak about a given topic, and one containing planned English speech. Empirical results showed that a good curriculum strategy can yield performance improvements and speed-up convergence. After a given number of epochs, our best strategy achieved a 5.6% and 3.4% decrease in terms of test set word error rate for the Finnish and English data sets, respectively.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167639324000840/pdfft?md5=60eaa8c29b9e0afde3f299e6bfeb1d10&pid=1-s2.0-S0167639324000840-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tone-syllable synchrony in Mandarin: New evidence and implications 普通话中的声调-音节同步性:新的证据和影响
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-07-31 DOI: 10.1016/j.specom.2024.103121

Recent research has shown evidence based on a minimal contrast paradigm that consonants and vowels are articulatorily synchronized at the onset of the syllable. What remains less clear is the laryngeal dimension of the syllable, for which evidence of tone synchrony with the consonant-vowel syllable has been circumstantial. The present study assesses the precise tone-vowel alignment in Mandarin Chinese by applying the minimal contrast paradigm. The vowel onset is determined by detecting divergence points of F2 trajectories between a pair of disyllabic sequences with two contrasting vowels, and the onsets of tones are determined by detecting divergence points of f0 trajectories in contrasting disyllabic tone pairs, using generalized additive mixed models (GAMMs). The alignment of the divergence-determined vowel and tone onsets is then evaluated with linear mixed effect models (LMEMs) and their synchrony is validated with Bayes factors. The results indicate that tone and vowel onsets are fully synchronized. There is therefore evidence for strict alignment of consonant, vowel and tone as hypothesized in the synchronization model of the syllable. Also, with the newly established tone onset, the previously reported ‘anticipatory raising’ effect of tone now appears to occur within rather than before the articulatory syllable. Implications of these findings will be discussed.

最近的研究表明,基于最小对比范式的证据表明,辅音和元音在音节开始时是发音同步的。但不太清楚的是音节的喉音维度,声调与辅音-元音音节同步的证据一直是间接的。本研究采用最小对比范式来评估汉语普通话中声调与元音的精确一致。元音的起始点是通过检测一对具有两个对比元音的双音节序列之间 F2 轨迹的发散点来确定的,而声调的起始点则是通过检测对比双音节声调对中轨迹的发散点来确定的。然后,利用线性混合效应模型(LMEMs)评估发散确定的元音和声调起始点的一致性,并利用贝叶斯因子验证它们的同步性。结果表明,声调和元音的起音是完全同步的。因此,有证据表明声母、韵母和声调是严格一致的,正如音节同步模型所假设的那样。此外,在新建立的声调起始点上,以前报告的声调 "预期提高 "效应现在似乎发生了,而不是发音音节。我们将讨论这些发现的意义。
{"title":"Tone-syllable synchrony in Mandarin: New evidence and implications","authors":"","doi":"10.1016/j.specom.2024.103121","DOIUrl":"10.1016/j.specom.2024.103121","url":null,"abstract":"<div><p>Recent research has shown evidence based on a minimal contrast paradigm that consonants and vowels are articulatorily synchronized at the onset of the syllable. What remains less clear is the laryngeal dimension of the syllable, for which evidence of tone synchrony with the consonant-vowel syllable has been circumstantial. The present study assesses the precise tone-vowel alignment in Mandarin Chinese by applying the minimal contrast paradigm. The vowel onset is determined by detecting divergence points of F2 trajectories between a pair of disyllabic sequences with two contrasting vowels, and the onsets of tones are determined by detecting divergence points of <em>f</em><sub>0</sub> trajectories in contrasting disyllabic tone pairs, using generalized additive mixed models (GAMMs). The alignment of the divergence-determined vowel and tone onsets is then evaluated with linear mixed effect models (LMEMs) and their synchrony is validated with Bayes factors. The results indicate that tone and vowel onsets are fully synchronized. There is therefore evidence for strict alignment of consonant, vowel and tone as hypothesized in the synchronization model of the syllable. Also, with the newly established tone onset, the previously reported ‘anticipatory raising’ effect of tone now appears to occur <em>within</em> rather than <em>before</em> the articulatory syllable. Implications of these findings will be discussed.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S016763932400092X/pdfft?md5=d240d5edd58b402ead4372ec1ec2baa9&pid=1-s2.0-S016763932400092X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Arabic Automatic Speech Recognition: Challenges and Progress 阿拉伯语自动语音识别:挑战与进步
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-07-31 DOI: 10.1016/j.specom.2024.103110

This paper provides a structured examination of Arabic Automatic Speech Recognition (ASR), focusing on the complexity posed by the language’s diverse forms and dialectal variations. We first explore the Arabic language forms, delimiting the challenges encountered with Dialectal Arabic, including issues such as code-switching and non-standardized orthography and, thus, the scarcity of large annotated datasets. Subsequently, we delve into the landscape of Arabic resources, distinguishing between Modern Standard Arabic (MSA) and Dialectal Arabic (DA) Speech Resources and highlighting the disparities in available data between these two categories. Finally, we analyze both traditional and modern approaches in Arabic ASR, assessing their effectiveness in addressing the unique challenges inherent to the language. Through this comprehensive examination, we aim to provide insights into the current state and future directions of Arabic ASR research and development.

本文对阿拉伯语自动语音识别(ASR)进行了结构化研究,重点关注该语言的多种形式和方言变化所带来的复杂性。我们首先探讨了阿拉伯语的语言形式,划分了方言阿拉伯语所遇到的挑战,包括代码转换和非标准化正字法等问题,以及大型注释数据集的稀缺性。随后,我们深入探讨了阿拉伯语资源的现状,区分了现代标准阿拉伯语 (MSA) 和方言阿拉伯语 (DA) 语音资源,并强调了这两个类别之间可用数据的差异。最后,我们分析了阿拉伯语 ASR 的传统和现代方法,评估了它们在应对阿拉伯语固有的独特挑战方面的有效性。通过这种全面的研究,我们旨在为阿拉伯语 ASR 研究和发展的现状和未来方向提供见解。
{"title":"Arabic Automatic Speech Recognition: Challenges and Progress","authors":"","doi":"10.1016/j.specom.2024.103110","DOIUrl":"10.1016/j.specom.2024.103110","url":null,"abstract":"<div><p>This paper provides a structured examination of Arabic Automatic Speech Recognition (ASR), focusing on the complexity posed by the language’s diverse forms and dialectal variations. We first explore the Arabic language forms, delimiting the challenges encountered with Dialectal Arabic, including issues such as code-switching and non-standardized orthography and, thus, the scarcity of large annotated datasets. Subsequently, we delve into the landscape of Arabic resources, distinguishing between Modern Standard Arabic (MSA) and Dialectal Arabic (DA) Speech Resources and highlighting the disparities in available data between these two categories. Finally, we analyze both traditional and modern approaches in Arabic ASR, assessing their effectiveness in addressing the unique challenges inherent to the language. Through this comprehensive examination, we aim to provide insights into the current state and future directions of Arabic ASR research and development.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Yanbian Korean speakers tend to merge /e/ and /ɛ/ when exposed to Seoul Korean 讲延边朝鲜语的人在接触首尔朝鲜语时,往往会把 /e/ 和 /ɛ/ 混为一谈。
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-07-30 DOI: 10.1016/j.specom.2024.103111

This study examined the vowel merger between the two vowels /e/ and /ɛ/ in Yanbian Korean. This sound change has already spread to Seoul Korean, particularly among speakers born after the 1970s. The aim of this study was to determine whether close exposure to Seoul Korean speakers leads to the neutralization of the distinction between the two vowels /e/ and /ɛ/. We recruited 20 Yanbian Korean speakers and asked them about their frequency of exposure to Seoul Korean. The exposure level of each participant was also recorded using a Likert scale. The results revealed that speakers with limited in-person interactions with Seoul Korean speakers exhibited distinct vowel productions within the vowel space. In contrast, those with frequent in-person interactions with Seoul Korean speakers tended to neutralize the two vowels, displaying considerably overlapping patterns in the vowel space. The relationship between the level of exposure to Seoul Korean and speakers’ vowel production was statistically confirmed by a linear regression analysis. Based on the results of this study, we speculate that the sound change in Yanbian Korean may become more widespread as Yanbian Korean speakers are increasingly exposed to Seoul Korean.

本研究考察了延边朝鲜语中/e/和/ɛ/两个元音之间的元音合并。这种音变已经传播到首尔朝鲜语中,尤其是在 20 世纪 70 年代后出生的朝鲜语使用者中。本研究旨在确定与首尔韩语使用者的密切接触是否会导致/e/和/ɛ/这两个元音之间的中和。我们招募了 20 名讲延边朝鲜语的人,询问他们接触首尔朝鲜语的频率。我们还使用李克特量表记录了每位受试者的接触水平。结果表明,与首尔朝鲜语者接触次数有限的人在元音空间内表现出不同的元音发音。与此相反,与首尔韩语使用者频繁接触的受试者则倾向于中和这两个元音,在元音空间中表现出相当程度的重叠模式。通过线性回归分析,统计证实了接触首尔韩语的程度与说话者元音发音之间的关系。根据本研究的结果,我们推测随着延边朝鲜语使用者越来越多地接触首尔朝鲜语,延边朝鲜语的音变可能会变得更加普遍。
{"title":"Yanbian Korean speakers tend to merge /e/ and /ɛ/ when exposed to Seoul Korean","authors":"","doi":"10.1016/j.specom.2024.103111","DOIUrl":"10.1016/j.specom.2024.103111","url":null,"abstract":"<div><p>This study examined the vowel merger between the two vowels /e/ and /ɛ/ in Yanbian Korean. This sound change has already spread to Seoul Korean, particularly among speakers born after the 1970s. The aim of this study was to determine whether close exposure to Seoul Korean speakers leads to the neutralization of the distinction between the two vowels /e/ and /ɛ/. We recruited 20 Yanbian Korean speakers and asked them about their frequency of exposure to Seoul Korean. The exposure level of each participant was also recorded using a Likert scale. The results revealed that speakers with limited in-person interactions with Seoul Korean speakers exhibited distinct vowel productions within the vowel space. In contrast, those with frequent in-person interactions with Seoul Korean speakers tended to neutralize the two vowels, displaying considerably overlapping patterns in the vowel space. The relationship between the level of exposure to Seoul Korean and speakers’ vowel production was statistically confirmed by a linear regression analysis. Based on the results of this study, we speculate that the sound change in Yanbian Korean may become more widespread as Yanbian Korean speakers are increasingly exposed to Seoul Korean.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prosody in narratives: An exploratory study with children with sex chromosomes trisomies 叙事中的拟声词:对性染色体三体儿童的探索性研究
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-07-26 DOI: 10.1016/j.specom.2024.103107

Although language delays are common in children with sex chromosome trisomies [SCT], no studies have analysed their prosodic abilities. Considering the importance of prosody in communication, this exploratory study aims to analyse the prosodic features of the narratives of 4-year-old children with SCT.

Participants included 22 children with SCT and 22 typically developing [TD] children. The Narrative Competence Task was administered to elicit the child's narrative. Each utterance was prosodically analysed considering pitch and timing variables.

Considering pitch, the only difference was the number of movements since the utterances of children with SCT were characterised by a lower speech modulation. However, considering the timing variables, children with SCT produced a faster speech rate and a shorter final syllable duration than TD children.

Since both speech modulation and duration measures have important syntactic and pragmatic functions, further investigations should deeply analyse the prosodic skills of children with SCT in interaction with syntax and pragmatics.

虽然性染色体三体综合征(SCT)儿童的语言发育迟缓很常见,但还没有研究对他们的拟声能力进行过分析。考虑到拟声词在交流中的重要性,本探索性研究旨在分析 4 岁性染色体三体综合征儿童叙事的拟声词特征。研究采用了 "叙事能力任务 "来诱导儿童叙事。在音调方面,由于 SCT 儿童的语句调式较低,因此唯一的区别在于动作的数量。由于语音调制和持续时间都具有重要的句法和语用功能,进一步的研究应深入分析 SCT 儿童的拟声技能与句法和语用的相互作用。
{"title":"Prosody in narratives: An exploratory study with children with sex chromosomes trisomies","authors":"","doi":"10.1016/j.specom.2024.103107","DOIUrl":"10.1016/j.specom.2024.103107","url":null,"abstract":"<div><p>Although language delays are common in children with sex chromosome trisomies [SCT], no studies have analysed their prosodic abilities. Considering the importance of prosody in communication, this exploratory study aims to analyse the prosodic features of the narratives of 4-year-old children with SCT.</p><p>Participants included 22 children with SCT and 22 typically developing [TD] children. The Narrative Competence Task was administered to elicit the child's narrative. Each utterance was prosodically analysed considering pitch and timing variables.</p><p>Considering pitch, the only difference was the number of movements since the utterances of children with SCT were characterised by a lower speech modulation. However, considering the timing variables, children with SCT produced a faster speech rate and a shorter final syllable duration than TD children.</p><p>Since both speech modulation and duration measures have important syntactic and pragmatic functions, further investigations should deeply analyse the prosodic skills of children with SCT in interaction with syntax and pragmatics.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167639324000797/pdfft?md5=0db7a9636fbd49fbec0c9533ae5f4537&pid=1-s2.0-S0167639324000797-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141846464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Progressive channel fusion for more efficient TDNN on speaker verification 渐进式信道融合可提高 TDNN 在扬声器验证方面的效率
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-07-23 DOI: 10.1016/j.specom.2024.103105

ECAPA-TDNN is one of the most popular TDNNs for speaker verification. While most of the updates pay attention to building precisely designed auxiliary modules, the depth-first principle has shown promising performance recently. However, empirical experiments show that one-dimensional convolution (Conv1D) based TDNNs suffer from performance degradation by simply adding massive vanilla basic blocks. Note that Conv1D naturally has a global receptive field (RF) on the feature dimension, progressive channel fusion (PCF) is proposed to alleviate this issue by introducing group convolution to build local RF and fusing the subbands progressively. Instead of reducing the group number in convolution layers used in the previous work, a novel channel permutation strategy is introduced to build information flow between groups so that all basic blocks in the model keep consistent parameter efficiency. The information leakage from lower-frequency bands to higher ones caused by Res2Block is simultaneously solved by introducing group-in-group convolution and using channel permutation. Besides the PCF strategy, redundant connections are removed for a more concise model architecture. The experiments on VoxCeleb and CnCeleb achieve state-of-the-art (SOTA) performance with an average relative improvement of 12.3% on EER and 13.2% on minDCF (0.01), validating the effectiveness of the proposed model.

ECAPA-TDNN 是用于扬声器验证的最流行 TDNN 之一。虽然大多数更新都注重构建精确设计的辅助模块,但深度优先原则最近已显示出良好的性能。然而,经验实验表明,基于一维卷积(Conv1D)的 TDNN 会因为简单地添加大量 vanilla 基本模块而导致性能下降。注意到 Conv1D 在特征维度上天然具有全局感受野(RF),我们提出了渐进信道融合(PCF),通过引入组卷积来建立局部 RF 并逐步融合子带,从而缓解这一问题。我们没有采用前人的方法来减少卷积层中的组数,而是引入了一种新颖的信道置换策略来建立组间信息流,从而使模型中的所有基本模块都能保持一致的参数效率。通过引入组内卷积和使用信道置换,同时解决了 Res2Block 造成的低频段向高频段的信息泄漏问题。除了 PCF 策略外,还移除了冗余连接,使模型结构更加简洁。在 VoxCeleb 和 CnCeleb 上进行的实验取得了最先进(SOTA)的性能,在 EER 和 minDCF (0.01) 上分别平均提高了 12.3% 和 13.2%,验证了所提模型的有效性。
{"title":"Progressive channel fusion for more efficient TDNN on speaker verification","authors":"","doi":"10.1016/j.specom.2024.103105","DOIUrl":"10.1016/j.specom.2024.103105","url":null,"abstract":"<div><p>ECAPA-TDNN is one of the most popular TDNNs for speaker verification. While most of the updates pay attention to building precisely designed auxiliary modules, the depth-first principle has shown promising performance recently. However, empirical experiments show that one-dimensional convolution (Conv1D) based TDNNs suffer from performance degradation by simply adding massive vanilla basic blocks. Note that Conv1D naturally has a global receptive field (RF) on the feature dimension, progressive channel fusion (PCF) is proposed to alleviate this issue by introducing group convolution to build local RF and fusing the subbands progressively. Instead of reducing the group number in convolution layers used in the previous work, a novel channel permutation strategy is introduced to build information flow between groups so that all basic blocks in the model keep consistent parameter efficiency. The information leakage from lower-frequency bands to higher ones caused by Res2Block is simultaneously solved by introducing group-in-group convolution and using channel permutation. Besides the PCF strategy, redundant connections are removed for a more concise model architecture. The experiments on VoxCeleb and CnCeleb achieve state-of-the-art (SOTA) performance with an average relative improvement of 12.3% on EER and 13.2% on minDCF (0.01), validating the effectiveness of the proposed model.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141960884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decoupled structure for improved adaptability of end-to-end models 解耦结构可提高端到端模型的适应性
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-07-23 DOI: 10.1016/j.specom.2024.103109

Although end-to-end (E2E) trainable automatic speech recognition (ASR) has shown great success by jointly learning acoustic and linguistic information, it still suffers from the effect of domain shifts, thus limiting potential applications. The E2E ASR model implicitly learns an internal language model (LM) which characterises the training distribution of the source domain, and the E2E trainable nature makes the internal LM difficult to adapt to the target domain with text-only data. To solve this problem, this paper proposes decoupled structures for attention-based encoder–decoder (Decoupled-AED) and neural transducer (Decoupled-Transducer) models, which can achieve flexible domain adaptation in both offline and online scenarios while maintaining robust intra-domain performance. To this end, the acoustic and linguistic parts of the E2E model decoder (or prediction network) are decoupled, making the linguistic component (i.e. internal LM) replaceable. When encountering a domain shift, the internal LM can be directly replaced during inference by a target-domain LM, without re-training or using domain-specific paired speech-text data. Experiments for E2E ASR models trained on the LibriSpeech-100h corpus showed that the proposed decoupled structure gave 15.1% and 17.2% relative word error rate reductions on the TED-LIUM 2 and AESRC2020 corpora while still maintaining performance on intra-domain data. It is also shown that the decoupled structure can be used to boost cross-domain speech translation quality while retaining the intra-domain performance.

尽管端到端(E2E)可训练自动语音识别(ASR)通过联合学习声学和语言信息取得了巨大成功,但它仍然受到领域转移的影响,从而限制了潜在的应用。E2E ASR 模型隐含地学习了一个内部语言模型(LM),该模型描述了源域的训练分布,而 E2E 可训练的特性使得内部 LM 难以适应纯文本数据的目标域。为了解决这个问题,本文提出了基于注意力的编码器-解码器(Decoupled-AED)和神经换能器(Decoupled-Transducer)模型的解耦结构,它可以在离线和在线场景下实现灵活的域适应,同时保持稳健的域内性能。为此,E2E 模型解码器(或预测网络)的声学和语言部分是解耦的,使得语言部分(即内部 LM)可以替换。当遇到领域转换时,内部 LM 可在推理过程中直接替换为目标领域的 LM,而无需重新训练或使用特定领域的语音-文本配对数据。在 LibriSpeech-100h 语料库上训练的 E2E ASR 模型的实验表明,所提出的解耦结构在 TED-LIUM 2 和 AESRC2020 语料库上分别降低了 15.1% 和 17.2% 的相对词错误率,同时仍能保持域内数据的性能。研究还表明,解耦结构可用于提高跨域语音翻译质量,同时保持域内性能。
{"title":"Decoupled structure for improved adaptability of end-to-end models","authors":"","doi":"10.1016/j.specom.2024.103109","DOIUrl":"10.1016/j.specom.2024.103109","url":null,"abstract":"<div><p>Although end-to-end (E2E) trainable automatic speech recognition (ASR) has shown great success by jointly learning acoustic and linguistic information, it still suffers from the effect of domain shifts, thus limiting potential applications. The E2E ASR model implicitly learns an internal language model (LM) which characterises the training distribution of the source domain, and the E2E trainable nature makes the internal LM difficult to adapt to the target domain with text-only data. To solve this problem, this paper proposes decoupled structures for attention-based encoder–decoder (Decoupled-AED) and neural transducer (Decoupled-Transducer) models, which can achieve flexible domain adaptation in both offline and online scenarios while maintaining robust intra-domain performance. To this end, the acoustic and linguistic parts of the E2E model decoder (or prediction network) are decoupled, making the linguistic component (i.e. internal LM) replaceable. When encountering a domain shift, the internal LM can be directly replaced during inference by a target-domain LM, without re-training or using domain-specific paired speech-text data. Experiments for E2E ASR models trained on the LibriSpeech-100h corpus showed that the proposed decoupled structure gave 15.1% and 17.2% relative word error rate reductions on the TED-LIUM 2 and AESRC2020 corpora while still maintaining performance on intra-domain data. It is also shown that the decoupled structure can be used to boost cross-domain speech translation quality while retaining the intra-domain performance.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167639324000803/pdfft?md5=7e35ebdc40ecd26754dcc103e392268c&pid=1-s2.0-S0167639324000803-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speechformer-CTC: Sequential modeling of depression detection with speech temporal classification Speechformer-CTC:利用语音时态分类对抑郁检测进行序列建模
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-07-18 DOI: 10.1016/j.specom.2024.103106

Speech-based automatic depression detection systems have been extensively explored over the past few years. Typically, each speaker is assigned a single label (Depressive or Non-depressive), and most approaches formulate depression detection as a speech classification task without explicitly considering the non-uniformly distributed depression pattern within segments, leading to low generalizability and robustness across different scenarios. However, depression corpora do not provide fine-grained labels (at the phoneme or word level) which makes the dynamic depression pattern in speech segments harder to track using conventional frameworks. To address this, we propose a novel framework, Speechformer-CTC, to model non-uniformly distributed depression characteristics within segments using a Connectionist Temporal Classification (CTC) objective function without the necessity of input–output alignment. Two novel CTC-label generation policies, namely the Expectation-One-Hot and the HuBERT policies, are proposed and incorporated in objectives on various granularities. Additionally, experiments using Automatic Speech Recognition (ASR) features are conducted to demonstrate the compatibility of the proposed method with content-based features. Our results show that the performance of depression detection, in terms of Macro F1-score, is improved on both DAIC-WOZ (English) and CONVERGE (Mandarin) datasets. On the DAIC-WOZ dataset, the system with HuBERT ASR features and a CTC objective optimized using HuBERT policy for label generation achieves 83.15% F1-score, which is close to state-of-the-art without the need for phoneme-level transcription or data augmentation. On the CONVERGE dataset, using Whisper features with the HuBERT policy improves the F1-score by 9.82% on CONVERGE1 (in-domain test set) and 18.47% on CONVERGE2 (out-of-domain test set). These findings show that depression detection can benefit from modeling non-uniformly distributed depression patterns and the proposed framework can be potentially used to determine significant depressive regions in speech utterances.

基于语音的抑郁自动检测系统在过去几年中得到了广泛的探索。通常情况下,每个说话者都会被赋予一个单一的标签(抑郁或非抑郁),而且大多数方法都将抑郁检测作为一项语音分类任务,而没有明确考虑片段内非均匀分布的抑郁模式,从而导致在不同场景下的通用性和鲁棒性较低。然而,抑郁语料库不提供细粒度标签(音素或单词级别),这使得使用传统框架跟踪语音片段中的动态抑郁模式变得更加困难。为了解决这个问题,我们提出了一个新颖的框架 Speechformer-CTC,利用 Connectionist Temporal Classification (CTC) 目标函数对片段内非均匀分布的抑郁特征进行建模,而无需输入输出对齐。提出了两种新颖的 CTC 标签生成策略,即期望一热策略和 HuBERT 策略,并将其纳入不同粒度的目标中。此外,还使用自动语音识别(ASR)特征进行了实验,以证明所提方法与基于内容的特征的兼容性。我们的结果表明,在 DAIC-WOZ(英语)和 CONVERGE(普通话)数据集上,抑郁检测的性能(宏观 F1 分数)都得到了提高。在 DAIC-WOZ 数据集上,采用 HuBERT ASR 特征和使用 HuBERT 策略优化标签生成的 CTC 目标的系统取得了 83.15% 的 F1 分数,接近最先进水平,无需进行音素级转录或数据增强。在 CONVERGE 数据集上,使用 Whisper 特征和 HuBERT 策略可将 CONVERGE1(域内测试集)的 F1 分数提高 9.82%,将 CONVERGE2(域外测试集)的 F1 分数提高 18.47%。这些研究结果表明,抑郁检测可以从非均匀分布的抑郁模式建模中获益,所提出的框架可用于确定语音语篇中的重要抑郁区域。
{"title":"Speechformer-CTC: Sequential modeling of depression detection with speech temporal classification","authors":"","doi":"10.1016/j.specom.2024.103106","DOIUrl":"10.1016/j.specom.2024.103106","url":null,"abstract":"<div><p>Speech-based automatic depression detection systems have been extensively explored over the past few years. Typically, each speaker is assigned a single label (Depressive or Non-depressive), and most approaches formulate depression detection as a speech classification task without explicitly considering the non-uniformly distributed depression pattern within segments, leading to low generalizability and robustness across different scenarios. However, depression corpora do not provide fine-grained labels (at the phoneme or word level) which makes the dynamic depression pattern in speech segments harder to track using conventional frameworks. To address this, we propose a novel framework, Speechformer-CTC, to model non-uniformly distributed depression characteristics within segments using a Connectionist Temporal Classification (CTC) objective function without the necessity of input–output alignment. Two novel CTC-label generation policies, namely the Expectation-One-Hot and the HuBERT policies, are proposed and incorporated in objectives on various granularities. Additionally, experiments using Automatic Speech Recognition (ASR) features are conducted to demonstrate the compatibility of the proposed method with content-based features. Our results show that the performance of depression detection, in terms of Macro F1-score, is improved on both DAIC-WOZ (English) and CONVERGE (Mandarin) datasets. On the DAIC-WOZ dataset, the system with HuBERT ASR features and a CTC objective optimized using HuBERT policy for label generation achieves 83.15% F1-score, which is close to state-of-the-art without the need for phoneme-level transcription or data augmentation. On the CONVERGE dataset, using Whisper features with the HuBERT policy improves the F1-score by 9.82% on CONVERGE1 (in-domain test set) and 18.47% on CONVERGE2 (out-of-domain test set). These findings show that depression detection can benefit from modeling non-uniformly distributed depression patterns and the proposed framework can be potentially used to determine significant depressive regions in speech utterances.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167639324000785/pdfft?md5=afe02da612b1e415b45579997ae4074e&pid=1-s2.0-S0167639324000785-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141842447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Whisper-SV: Adapting Whisper for low-data-resource speaker verification Whisper-SV:为低数据资源扬声器验证调整 Whisper
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-07-14 DOI: 10.1016/j.specom.2024.103103

Trained on 680,000 h of massive speech data, Whisper is a multitasking, multilingual speech foundation model demonstrating superior performance in automatic speech recognition, translation, and language identification. However, its applicability in speaker verification (SV) tasks remains unexplored, particularly in low-data-resource scenarios where labeled speaker data in specific domains are limited. To fill this gap, we propose a lightweight adaptor framework to boost SV with Whisper, namely Whisper-SV. Given that Whisper is not specifically optimized for SV tasks, we introduce a representation selection module to quantify the speaker-specific characteristics contained in each layer of Whisper and select the top-k layers with prominent discriminative speaker features. To aggregate pivotal speaker-related features while diminishing non-speaker redundancies across the selected top-k distinct layers of Whisper, we design a multi-layer aggregation module in Whisper-SV to integrate multi-layer representations into a singular, compacted representation for SV. In the multi-layer aggregation module, we employ convolutional layers with shortcut connections among different layers to refine speaker characteristics derived from multi-layer representations from Whisper. In addition, an attention aggregation layer is used to reduce non-speaker interference and amplify speaker-specific cues for SV tasks. Finally, a simple classification module is used for speaker classification. Experiments on VoxCeleb1, FFSVC, and IMSV datasets demonstrate that Whisper-SV achieves EER/minDCF of 2.22%/0.307, 6.14%/0.488, and 7.50%/0.582, respectively, showing superior performance in low-data-resource SV scenarios.

Whisper 是一种多任务、多语言语音基础模型,曾在 680,000 小时的海量语音数据上进行过训练,在自动语音识别、翻译和语言识别方面表现出卓越的性能。然而,它在说话人验证(SV)任务中的适用性仍有待探索,尤其是在低数据资源场景中,因为特定领域的标注说话人数据有限。为了填补这一空白,我们提出了一个轻量级适配器框架,即 Whisper-SV,来利用 Whisper 提升 SV。鉴于 Whisper 并未专门针对 SV 任务进行优化,我们引入了一个表征选择模块,以量化 Whisper 每一层所包含的特定说话人特征,并选择具有突出辨别说话人特征的前 k 层。为了聚合与说话人相关的关键特征,同时减少 Whisper 所选的前 k 个不同层中的非说话人冗余,我们在 Whisper-SV 中设计了一个多层聚合模块,将多层表示法整合为一个单一、紧凑的 SV 表示法。在多层聚合模块中,我们利用卷积层与不同层之间的捷径连接来完善从 Whisper 多层表征中得出的说话者特征。此外,我们还利用注意力聚合层来减少非说话者的干扰,并放大 SV 任务中说话者的特定线索。最后,一个简单的分类模块用于扬声器分类。在 VoxCeleb1、FFSVC 和 IMSV 数据集上的实验表明,Whisper-SV 的 EER/minDCF 分别为 2.22%/0.307、6.14%/0.488 和 7.50%/0.582,在低数据资源 SV 场景中表现出了卓越的性能。
{"title":"Whisper-SV: Adapting Whisper for low-data-resource speaker verification","authors":"","doi":"10.1016/j.specom.2024.103103","DOIUrl":"10.1016/j.specom.2024.103103","url":null,"abstract":"<div><p>Trained on 680,000 h of massive speech data, Whisper is a multitasking, multilingual speech foundation model demonstrating superior performance in automatic speech recognition, translation, and language identification. However, its applicability in speaker verification (SV) tasks remains unexplored, particularly in low-data-resource scenarios where labeled speaker data in specific domains are limited. To fill this gap, we propose a lightweight adaptor framework to boost SV with Whisper, namely Whisper-SV. Given that Whisper is not specifically optimized for SV tasks, we introduce a representation selection module to quantify the speaker-specific characteristics contained in each layer of Whisper and select the top-k layers with prominent discriminative speaker features. To aggregate pivotal speaker-related features while diminishing non-speaker redundancies across the selected top-k distinct layers of Whisper, we design a multi-layer aggregation module in Whisper-SV to integrate multi-layer representations into a singular, compacted representation for SV. In the multi-layer aggregation module, we employ convolutional layers with shortcut connections among different layers to refine speaker characteristics derived from multi-layer representations from Whisper. In addition, an attention aggregation layer is used to reduce non-speaker interference and amplify speaker-specific cues for SV tasks. Finally, a simple classification module is used for speaker classification. Experiments on VoxCeleb1, FFSVC, and IMSV datasets demonstrate that Whisper-SV achieves EER/minDCF of 2.22%/0.307, 6.14%/0.488, and 7.50%/0.582, respectively, showing superior performance in low-data-resource SV scenarios.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141701112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing speaker embedding learning: Wespeaker toolkit for research and production 推进演讲者嵌入式学习:用于研究和制作的 Wespeaker 工具包
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-07-01 DOI: 10.1016/j.specom.2024.103104

Speaker modeling plays a crucial role in various tasks, and fixed-dimensional vector representations, known as speaker embeddings, are the predominant modeling approach. These embeddings are typically evaluated within the framework of speaker verification, yet their utility extends to a broad scope of related tasks including speaker diarization, speech synthesis, voice conversion, and target speaker extraction. This paper presents Wespeaker, a user-friendly toolkit designed for both research and production purposes, dedicated to the learning of speaker embeddings. Wespeaker offers scalable data management, state-of-the-art speaker embedding models, and self-supervised learning training schemes with the potential to leverage large-scale unlabeled real-world data. The toolkit incorporates structured recipes that have been successfully adopted in winning systems across various speaker verification challenges, ensuring highly competitive results. For production-oriented development, Wespeaker integrates CPU- and GPU-compatible deployment and runtime codes, supporting mainstream platforms such as Windows, Linux, Mac and on-device chips such as horizon X3’PI. Wespeaker also provides off-the-shelf high-quality speaker embeddings by providing various pretrained models, which can be effortlessly applied to different tasks that require speaker modeling. The toolkit is publicly available at https://github.com/wenet-e2e/wespeaker.

扬声器建模在各种任务中起着至关重要的作用,而固定维度的向量表示(即扬声器嵌入)是最主要的建模方法。这些嵌入通常是在扬声器验证的框架内进行评估的,但它们的用途也扩展到了扬声器日记化、语音合成、语音转换和目标扬声器提取等广泛的相关任务中。本文介绍的 Wespeaker 是一款用户友好型工具包,既可用于研究,也可用于生产,专门用于学习扬声器嵌入。Wespeaker 提供可扩展的数据管理、最先进的扬声器嵌入模型和自监督学习训练方案,具有利用大规模无标注真实世界数据的潜力。该工具包采用了结构化配方,这些配方已成功应用于各种扬声器验证挑战的成功系统中,确保了极具竞争力的结果。对于面向生产的开发,Wespeaker集成了CPU和GPU兼容的部署和运行代码,支持Windows、Linux、Mac等主流平台和horizon X3'PI等设备上芯片。Wespeaker 还通过提供各种预训练模型提供现成的高质量扬声器嵌入,这些模型可以毫不费力地应用于需要扬声器建模的不同任务。该工具包可通过 https://github.com/wenet-e2e/wespeaker 公开获取。
{"title":"Advancing speaker embedding learning: Wespeaker toolkit for research and production","authors":"","doi":"10.1016/j.specom.2024.103104","DOIUrl":"10.1016/j.specom.2024.103104","url":null,"abstract":"<div><p>Speaker modeling plays a crucial role in various tasks, and fixed-dimensional vector representations, known as speaker embeddings, are the predominant modeling approach. These embeddings are typically evaluated within the framework of speaker verification, yet their utility extends to a broad scope of related tasks including speaker diarization, speech synthesis, voice conversion, and target speaker extraction. This paper presents Wespeaker, a user-friendly toolkit designed for both research and production purposes, dedicated to the learning of speaker embeddings. Wespeaker offers scalable data management, state-of-the-art speaker embedding models, and self-supervised learning training schemes with the potential to leverage large-scale unlabeled real-world data. The toolkit incorporates structured recipes that have been successfully adopted in winning systems across various speaker verification challenges, ensuring highly competitive results. For production-oriented development, Wespeaker integrates CPU- and GPU-compatible deployment and runtime codes, supporting mainstream platforms such as Windows, Linux, Mac and on-device chips such as horizon X3’PI. Wespeaker also provides off-the-shelf high-quality speaker embeddings by providing various pretrained models, which can be effortlessly applied to different tasks that require speaker modeling. The toolkit is publicly available at <span><span>https://github.com/wenet-e2e/wespeaker</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141688867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Speech Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1