首页 > 最新文献

Speech Communication最新文献

英文 中文
Chinese speech intelligibility and speech intelligibility index for the elderly 中文语音清晰度和老年人语音清晰度指数
IF 3.2 3区 计算机科学 Q1 Arts and Humanities Pub Date : 2024-04-21 DOI: 10.1016/j.specom.2024.103072
Jiazhong Zeng , Jianxin Peng , Shuyin Xiang

The speech intelligibility index (SII) and speech transmission index (STI) are widely accepted objective metrics for assessing speech intelligibility. In previous work, the relationship between STI and Chinese speech intelligibility (CSI) scores was studied. In this paper, the relationship between SII and CSI scores in rooms for the elderly aged 60–69 and over 70 is investigated by using auralization method under different background noise levels (40dBA and 55dBA) and different reverberation times. The results show that SII has good correlation with CSI score of the elderly. To get the same CSI score as the young adults, the elderly need a larger SII value, and the value increases with the increase of the age for the elderly. Since hearing loss of the elderly is considered in the calculation of SII, the difference in the required SII between the elderly and young is less than that of the required STI under the same CSI score condition. This indicates that SII is a more consistent evaluation criterion for different ages.

语音清晰度指数(SII)和语音传输指数(STI)是公认的评估语音清晰度的客观指标。在以往的工作中,人们研究了 STI 与中文语音清晰度(CSI)得分之间的关系。本文在不同的背景噪声水平(40dBA 和 55dBA)和不同的混响时间下,采用听觉化方法研究了 60-69 岁和 70 岁以上老年人房间中 SII 和 CSI 分数之间的关系。结果表明,SII 与老年人的 CSI 分数具有良好的相关性。要获得与青壮年相同的 CSI 分数,老年人需要更大的 SII 值,而且该值随着老年人年龄的增加而增加。由于在计算 SII 时考虑了老年人的听力损失,因此在 CSI 分数相同的条件下,老年人和年轻人所需的 SII 差异小于所需的 STI 差异。这表明,SII 对不同年龄的人来说是一个更加一致的评价标准。
{"title":"Chinese speech intelligibility and speech intelligibility index for the elderly","authors":"Jiazhong Zeng ,&nbsp;Jianxin Peng ,&nbsp;Shuyin Xiang","doi":"10.1016/j.specom.2024.103072","DOIUrl":"https://doi.org/10.1016/j.specom.2024.103072","url":null,"abstract":"<div><p>The speech intelligibility index (SII) and speech transmission index (STI) are widely accepted objective metrics for assessing speech intelligibility. In previous work, the relationship between STI and Chinese speech intelligibility (CSI) scores was studied. In this paper, the relationship between SII and CSI scores in rooms for the elderly aged 60–69 and over 70 is investigated by using auralization method under different background noise levels (40dBA and 55dBA) and different reverberation times. The results show that SII has good correlation with CSI score of the elderly. To get the same CSI score as the young adults, the elderly need a larger SII value, and the value increases with the increase of the age for the elderly. Since hearing loss of the elderly is considered in the calculation of SII, the difference in the required SII between the elderly and young is less than that of the required STI under the same CSI score condition. This indicates that SII is a more consistent evaluation criterion for different ages.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140638630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combined approach to dysarthric speaker verification using data augmentation and feature fusion 利用数据扩增和特征融合的组合方法验证发音障碍者
IF 3.2 3区 计算机科学 Q1 Arts and Humanities Pub Date : 2024-04-06 DOI: 10.1016/j.specom.2024.103070
Shinimol Salim , Syed Shahnawazuddin , Waquar Ahmad

In this study, the challenges of adapting automatic speaker verification (ASV) systems to accommodate individuals with dysarthria, a speech disorder affecting intelligibility and articulation, are addressed. The scarcity of dysarthric speech data presents a significant obstacle in the development of an effective ASV system. To mitigate the detrimental effects of data paucity, an out-of-domain data augmentation approach was employed based on the observation that dysarthric speech often exhibits longer phoneme duration. Motivated by this observation, the duration of healthy speech data was modified with various stretching factors and then pooled into training, resulting in a significant reduction in the error rate. In addition to analyzing average phoneme duration, another analysis revealed that dysarthric speech contains crucial high-frequency spectral information. However, Mel-frequency cepstral coefficients (MFCC) are inherently designed to down-sample spectral information in the higher-frequency regions, and the same is true for Mel-filterbank features. To address this shortcoming, Linear-filterbank cepstral coefficients (LFCC) were used in combination with MFCC features. While MFCC effectively captures certain aspects of dysarthric speech, LFCC complements this by capturing high-frequency details essential for accurate dysarthric speaker verification. This proposed feature fusion effectively minimizes spectral information loss, further reducing error rates. To support the significance of combination of MFCC and LFCC features in an automatic speaker verification system for speakers with dysarthria, comprehensive experimentation was conducted. The fusion of MFCC and LFCC features was compared with several other front-end acoustic features, such as Mel-filterbank features, linear filterbank features, wavelet filterbank features, linear prediction cepstral coefficients (LPCC), frequency domain LPCC, and constant Q cepstral coefficients (CQCC). The approaches were evaluated using both i-vector and x-vector-based representation, comparing systems developed using MFCC and LFCC features individually and in combination. The experimental results presented in this paper demonstrate substantial improvements, with a 25.78% reduction in equal error rate (EER) for i-vector models and a 23.66% reduction in EER for x-vector models when compared to the baseline ASV system. Additionally, the effect of feature concatenation with variation in dysarthria severity levels (low, medium, and high) was studied, and the proposed approach was found to be highly effective in those cases as well.

构音障碍是一种影响清晰度和发音的语言障碍,本研究探讨了如何调整自动说话者验证(ASV)系统以适应构音障碍患者的挑战。构音障碍语音数据的匮乏是开发有效 ASV 系统的一大障碍。为了减轻数据匮乏带来的不利影响,我们采用了一种域外数据增强方法,该方法基于对发音障碍语音通常表现出较长音素持续时间的观察。在这一观察结果的激励下,健康语音数据的持续时间被各种拉伸因子修改,然后汇集到训练中,从而显著降低了错误率。除了分析平均音素持续时间外,另一项分析显示,发音障碍语音包含重要的高频频谱信息。然而,Mel-frequency cepstral coefficients(MFCC)的固有设计会降低高频区域的频谱信息采样率,Mel-filterbank 特征也是如此。为了解决这一缺陷,我们将线性滤波器组共谱系数(LFCC)与 MFCC 特征结合使用。MFCC 能有效捕捉发音障碍语音的某些方面,而 LFCC 则能捕捉高频细节,从而对准确验证发音障碍说话者起到补充作用。这种拟议的特征融合有效地减少了频谱信息损失,进一步降低了错误率。为了证明 MFCC 和 LFCC 特征在构音障碍说话人自动验证系统中的重要性,我们进行了全面的实验。MFCC 和 LFCC 特征的融合与其他几种前端声学特征进行了比较,如 Mel 滤波库特征、线性滤波库特征、小波滤波库特征、线性预测前谱系数 (LPCC)、频域 LPCC 和常数 Q 前谱系数 (CQCC)。本文使用基于 i 向量和 x 向量的表示方法对这些方法进行了评估,并对使用 MFCC 和 LFCC 特征单独或组合开发的系统进行了比较。与基线 ASV 系统相比,本文介绍的实验结果表明,i-vector 模型的等效错误率 (EER) 降低了 25.78%,x-vector 模型的等效错误率降低了 23.66%。此外,还研究了构音障碍严重程度变化(低、中、高)对特征连接的影响,发现所提出的方法在这些情况下也非常有效。
{"title":"Combined approach to dysarthric speaker verification using data augmentation and feature fusion","authors":"Shinimol Salim ,&nbsp;Syed Shahnawazuddin ,&nbsp;Waquar Ahmad","doi":"10.1016/j.specom.2024.103070","DOIUrl":"https://doi.org/10.1016/j.specom.2024.103070","url":null,"abstract":"<div><p>In this study, the challenges of adapting automatic speaker verification (ASV) systems to accommodate individuals with dysarthria, a speech disorder affecting intelligibility and articulation, are addressed. The scarcity of dysarthric speech data presents a significant obstacle in the development of an effective ASV system. To mitigate the detrimental effects of data paucity, an out-of-domain data augmentation approach was employed based on the observation that dysarthric speech often exhibits longer phoneme duration. Motivated by this observation, the duration of healthy speech data was modified with various stretching factors and then pooled into training, resulting in a significant reduction in the error rate. In addition to analyzing average phoneme duration, another analysis revealed that dysarthric speech contains crucial high-frequency spectral information. However, Mel-frequency cepstral coefficients (MFCC) are inherently designed to down-sample spectral information in the higher-frequency regions, and the same is true for Mel-filterbank features. To address this shortcoming, Linear-filterbank cepstral coefficients (LFCC) were used in combination with MFCC features. While MFCC effectively captures certain aspects of dysarthric speech, LFCC complements this by capturing high-frequency details essential for accurate dysarthric speaker verification. This proposed feature fusion effectively minimizes spectral information loss, further reducing error rates. To support the significance of combination of MFCC and LFCC features in an automatic speaker verification system for speakers with dysarthria, comprehensive experimentation was conducted. The fusion of MFCC and LFCC features was compared with several other front-end acoustic features, such as Mel-filterbank features, linear filterbank features, wavelet filterbank features, linear prediction cepstral coefficients (LPCC), frequency domain LPCC, and constant Q cepstral coefficients (CQCC). The approaches were evaluated using both <em>i</em>-vector and <em>x</em>-vector-based representation, comparing systems developed using MFCC and LFCC features individually and in combination. The experimental results presented in this paper demonstrate substantial improvements, with a 25.78% reduction in equal error rate (EER) for <em>i</em>-vector models and a 23.66% reduction in EER for <em>x</em>-vector models when compared to the baseline ASV system. Additionally, the effect of feature concatenation with variation in dysarthria severity levels (low, medium, and high) was studied, and the proposed approach was found to be highly effective in those cases as well.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140555266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speech intelligibility prediction using generalized ESTOI with fine-tuned parameters 使用带微调参数的广义 ESTOI 预测语音清晰度
IF 3.2 3区 计算机科学 Q1 Arts and Humanities Pub Date : 2024-04-01 DOI: 10.1016/j.specom.2024.103068
Szymon Drgas

In this article, a lightweight and interpretable speech intelligibility prediction network is proposed. It is based on the ESTOI metric with several extensions: learned modulation filterbank, temporal attention, and taking into account robustness of a given reference recording. The proposed network is differentiable, and therefore it can be applied as a loss function in speech enhancement systems. The method was evaluated using the Clarity Prediction Challenge dataset. Compared to MB-STOI, the best of the systems proposed in this paper reduced RMSE from 28.01 to 21.33. It also outperformed best performing systems from the Clarity Challenge, while its training does not require additional labels like speech enhancement system and talker. It also has small memory and requirements, therefore, it can be potentially used as a loss function to train speech enhancement system. As it would consume less resources, the saved ones can be used for a larger speech enhancement neural network.

本文提出了一种轻量级、可解释的语音清晰度预测网络。该网络以 ESTOI 指标为基础,并进行了多项扩展:学习调制滤波器库、时间关注度,并考虑了给定参考录音的鲁棒性。所提出的网络是可微分的,因此可以作为损失函数应用于语音增强系统中。我们使用清晰度预测挑战赛数据集对该方法进行了评估。与 MB-STOI 相比,本文提出的最佳系统将 RMSE 从 28.01 降至 21.33。它的训练不需要额外的标签,如语音增强系统和说话者。它的内存和要求也很小,因此有可能用作训练语音增强系统的损失函数。由于它消耗的资源较少,节省下来的资源可用于更大的语音增强神经网络。
{"title":"Speech intelligibility prediction using generalized ESTOI with fine-tuned parameters","authors":"Szymon Drgas","doi":"10.1016/j.specom.2024.103068","DOIUrl":"https://doi.org/10.1016/j.specom.2024.103068","url":null,"abstract":"<div><p>In this article, a lightweight and interpretable speech intelligibility prediction network is proposed. It is based on the ESTOI metric with several extensions: learned modulation filterbank, temporal attention, and taking into account robustness of a given reference recording. The proposed network is differentiable, and therefore it can be applied as a loss function in speech enhancement systems. The method was evaluated using the Clarity Prediction Challenge dataset. Compared to MB-STOI, the best of the systems proposed in this paper reduced RMSE from 28.01 to 21.33. It also outperformed best performing systems from the Clarity Challenge, while its training does not require additional labels like speech enhancement system and talker. It also has small memory and requirements, therefore, it can be potentially used as a loss function to train speech enhancement system. As it would consume less resources, the saved ones can be used for a larger speech enhancement neural network.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140540077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An ensemble technique to predict Parkinson's disease using machine learning algorithms 利用机器学习算法预测帕金森病的集合技术
IF 3.2 3区 计算机科学 Q1 Arts and Humanities Pub Date : 2024-04-01 DOI: 10.1016/j.specom.2024.103067
Nutan Singh, Priyanka Tripathi

Parkinson's Disease (PD) is a progressive neurodegenerative disorder affecting motor and non-motor symptoms. Its symptoms develop slowly, making early identification difficult. Machine learning has a significant potential to predict Parkinson's disease on features hidden in voice data. This work aimed to identify the most relevant features from a high-dimensional dataset, which helps accurately classify Parkinson's Disease with less computation time. Three individual datasets with various medical features based on voice have been analyzed in this work. An Ensemble Feature Selection Algorithm (EFSA) technique based on filter, wrapper, and embedding algorithms that pick highly relevant features for identifying Parkinson's Disease is proposed, and the same has been validated on three different datasets based on voice. These techniques can shorten training time to improve model accuracy and minimize overfitting. We utilized different ML models such as K-Nearest Neighbors (KNN), Random Forest, Decision Tree, Support Vector Machine (SVM), Bagging Classifier, Multi-Layer Perceptron (MLP) Classifier, and Gradient Boosting. Each of these models was fine-tuned to ensure optimal performance within our specific context. Moreover, in addition to these established classifiers, we proposed an ensemble classifier is found on a high optimal majority of the votes. Dataset-I achieves classification accuracy with 97.6 %, F1-score 97.9 %, precision with 98 % and recall with 98 %. Dataset-II achieves classification accuracy 90.2 %, F1-score 90.2 %, precision 90.2 %, and recall 90.5 %. Dataset-III achieves 83.3 % accuracy, F1-score 83.3 %, precision 83.5 % and recall 83.3 %. These results have been taken using 13 out of 23, 45 out of 754, and 17 out of 46 features from respective datasets. The proposed EFSA model has performed with higher accuracy and is more efficient than other models for each dataset.

帕金森病(PD)是一种进行性神经退行性疾病,影响运动和非运动症状。其症状发展缓慢,因此很难早期识别。机器学习在根据隐藏在语音数据中的特征预测帕金森病方面潜力巨大。这项工作旨在从高维数据集中找出最相关的特征,从而有助于以较少的计算时间准确地对帕金森病进行分类。这项研究分析了三个基于语音的具有各种医疗特征的数据集。本文提出了一种基于过滤器、包装器和嵌入算法的集合特征选择算法(EFSA)技术,该技术可挑选出与帕金森病识别高度相关的特征,并在三个不同的语音数据集上进行了验证。这些技术可以缩短训练时间,从而提高模型的准确性,并最大限度地减少过拟合。我们采用了不同的 ML 模型,如 K-Nearest Neighbors (KNN)、随机森林、决策树、支持向量机 (SVM)、袋式分类器、多层感知器 (MLP) 分类器和梯度提升。这些模型中的每一个都经过了微调,以确保在我们的特定情况下达到最佳性能。此外,除了这些成熟的分类器外,我们还提出了一种集合分类器,它能获得最佳多数选票。数据集 I 的分类准确率为 97.6%,F1 分数为 97.9%,精确度为 98%,召回率为 98%。数据集 II 的分类准确率为 90.2 %,F1 分数为 90.2 %,精确度为 90.2 %,召回率为 90.5 %。数据集 III 的分类准确率为 83.3%,F1 分数为 83.3%,精确度为 83.5%,召回率为 83.3%。这些结果分别使用了数据集中 23 个特征中的 13 个、754 个特征中的 45 个和 46 个特征中的 17 个。就每个数据集而言,拟议的 EFSA 模型都比其他模型具有更高的准确率和更高的效率。
{"title":"An ensemble technique to predict Parkinson's disease using machine learning algorithms","authors":"Nutan Singh,&nbsp;Priyanka Tripathi","doi":"10.1016/j.specom.2024.103067","DOIUrl":"https://doi.org/10.1016/j.specom.2024.103067","url":null,"abstract":"<div><p>Parkinson's Disease (PD) is a progressive neurodegenerative disorder affecting motor and non-motor symptoms. Its symptoms develop slowly, making early identification difficult. Machine learning has a significant potential to predict Parkinson's disease on features hidden in voice data. This work aimed to identify the most relevant features from a high-dimensional dataset, which helps accurately classify Parkinson's Disease with less computation time. Three individual datasets with various medical features based on voice have been analyzed in this work. An Ensemble Feature Selection Algorithm (EFSA) technique based on filter, wrapper, and embedding algorithms that pick highly relevant features for identifying Parkinson's Disease is proposed, and the same has been validated on three different datasets based on voice. These techniques can shorten training time to improve model accuracy and minimize overfitting. We utilized different ML models such as K-Nearest Neighbors (KNN), Random Forest, Decision Tree, Support Vector Machine (SVM), Bagging Classifier, Multi-Layer Perceptron (MLP) Classifier, and Gradient Boosting. Each of these models was fine-tuned to ensure optimal performance within our specific context. Moreover, in addition to these established classifiers, we proposed an ensemble classifier is found on a high optimal majority of the votes. Dataset-I achieves classification accuracy with 97.6 %, F<sub>1</sub>-score 97.9 %, precision with 98 % and recall with 98 %. Dataset-II achieves classification accuracy 90.2 %, F<sub>1</sub>-score 90.2 %, precision 90.2 %, and recall 90.5 %. Dataset-III achieves 83.3 % accuracy, F<sub>1</sub>-score 83.3 %, precision 83.5 % and recall 83.3 %. These results have been taken using 13 out of 23, 45 out of 754, and 17 out of 46 features from respective datasets. The proposed EFSA model has performed with higher accuracy and is more efficient than other models for each dataset.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140547363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multimodal model for predicting feedback position and type during conversation 预测对话过程中反馈位置和类型的多模态模型
IF 3.2 3区 计算机科学 Q1 Arts and Humanities Pub Date : 2024-04-01 DOI: 10.1016/j.specom.2024.103066
Auriane Boudin , Roxane Bertrand , Stéphane Rauzy , Magalie Ochs , Philippe Blache

This study investigates conversational feedback, that is, a listener's reaction in response to a speaker, a phenomenon which occurs in all natural interactions. Feedback depends on the main speaker's productions and in return supports the elaboration of the interaction. As a consequence, feedback production has a direct impact on the quality of the interaction.

This paper examines all types of feedback, from generic to specific feedback, the latter of which has received less attention in the literature. We also present a fine-grained labeling system introducing two sub-types of specific feedback: positive/negative and given/new. Following a literature review on linguistic and machine learning perspectives highlighting the main issues in feedback prediction, we present a model based on a set of multimodal features which predicts the possible position of feedback and its type. This computational model makes it possible to precisely identify the different features in the speaker's production (morpho-syntactic, prosodic and mimo-gestural) which play a role in triggering feedback from the listener; the model also evaluates their relative importance.

The main contribution of this study is twofold: we sought to improve 1/ the model's performance in comparison with other approaches relying on a small set of features, and 2/ the model's interpretability, in particular by investigating feature importance. By integrating all the different modalities as well as high-level features, our model is uniquely positioned to be applied to French corpora.

本研究调查的是会话反馈,即听者对说话者的反应,这是所有自然互动中都会出现的现象。反馈取决于主讲人的话语,并反过来支持互动的阐述。因此,反馈的产生对互动的质量有着直接的影响。本文研究了所有类型的反馈,从一般反馈到特殊反馈,后者在文献中受到的关注较少。我们还提出了一个细粒度标签系统,引入了两种特定反馈的子类型:正面/负面和给定/新反馈。在从语言学和机器学习的角度对反馈预测中的主要问题进行文献综述后,我们提出了一个基于多模态特征集的模型,该模型可预测反馈的可能位置及其类型。本研究的主要贡献有两个方面:1/与其他依赖于少量特征集的方法相比,我们试图提高模型的性能;2/模型的可解释性,特别是通过研究特征的重要性。通过整合所有不同的模式以及高级特征,我们的模型在应用于法语语料库方面具有独特的优势。
{"title":"A multimodal model for predicting feedback position and type during conversation","authors":"Auriane Boudin ,&nbsp;Roxane Bertrand ,&nbsp;Stéphane Rauzy ,&nbsp;Magalie Ochs ,&nbsp;Philippe Blache","doi":"10.1016/j.specom.2024.103066","DOIUrl":"https://doi.org/10.1016/j.specom.2024.103066","url":null,"abstract":"<div><p>This study investigates conversational feedback, that is, a listener's reaction in response to a speaker, a phenomenon which occurs in all natural interactions. Feedback depends on the main speaker's productions and in return supports the elaboration of the interaction. As a consequence, feedback production has a direct impact on the quality of the interaction.</p><p>This paper examines all types of feedback, from generic to specific feedback, the latter of which has received less attention in the literature. We also present a fine-grained labeling system introducing two sub-types of specific feedback: <em>positive/negative</em> and <em>given/new</em>. Following a literature review on linguistic and machine learning perspectives highlighting the main issues in feedback prediction, we present a model based on a set of multimodal features which predicts the possible position of feedback and its type. This computational model makes it possible to precisely identify the different features in the speaker's production (morpho-syntactic, prosodic and mimo-gestural) which play a role in triggering feedback from the listener; the model also evaluates their relative importance.</p><p>The main contribution of this study is twofold: we sought to improve 1/ the model's performance in comparison with other approaches relying on a small set of features, and 2/ the model's interpretability, in particular by investigating feature importance. By integrating all the different modalities as well as high-level features, our model is uniquely positioned to be applied to French corpora.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167639324000384/pdfft?md5=d3bb6a1d05cfbf539d30e718f252c2d8&pid=1-s2.0-S0167639324000384-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140331131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic speaker and age identification of children from raw speech using sincNet over ERB scale 在 ERB 标度上使用 sincNet 从原始语音中自动识别说话者和儿童年龄
IF 3.2 3区 计算机科学 Q1 Arts and Humanities Pub Date : 2024-04-01 DOI: 10.1016/j.specom.2024.103069
Kodali Radha , Mohan Bansal , Ram Bilas Pachori

This paper presents the newly developed non-native children’s English speech (NNCES) corpus to reveal the findings of automatic speaker and age recognition from raw speech. Convolutional neural networks (CNN), which have the ability to learn low-level speech representations, can be fed directly with raw speech signals instead of using traditional hand-crafted features. Moreover, the filters that were learned using standard CNNs appeared to be noisy because they consider all elements of each filter. In contrast, sincNet can be able to generate more meaningful filters simply by replacing the first convolutional layer by a sinc-layer in standard CNNs. The low and high cutoff frequencies of the rectangular band-pass filter are the only parameters that can be learned in sincNet, which has the potential to extract significant speech cues from the speaker, such as pitch and formants. In this work, the sincNet model is significantly changed by switching from baseline Mel scale initializations to equivalent rectangular bandwidth (ERB) initializations, which has the added benefit of allocating additional filters in the lower region of the spectrum. Additionally, it needs to be highlighted that the novel sincNet model is well suited to identify the age of the children. The investigations on both read and spontaneous speech tasks in speaker identification, gender independent & dependent age-group identification of children outperform the baseline models with varying relative improvements in terms of accuracy.

本文介绍了新开发的非母语儿童英语语音(NNCES)语料库,以揭示从原始语音自动识别说话人和年龄的研究成果。卷积神经网络(CNN)具有学习低级语音表征的能力,可以直接输入原始语音信号,而不是使用传统的手工创建特征。此外,使用标准 CNN 学习的滤波器会出现噪音,因为它们会考虑每个滤波器的所有元素。相比之下,sincNet 只需将第一个卷积层替换为标准 CNN 中的 sinc 层,就能生成更有意义的滤波器。矩形带通滤波器的低频和高频截止频率是 sincNet 中唯一可以学习的参数,它有可能从说话者身上提取重要的语音线索,如音高和声调。在这项工作中,sincNet 模型通过从基线 Mel scale 初始化转换为等效矩形带宽 (ERB) 初始化而发生了显著变化,其额外的好处是在频谱的较低区域分配了额外的滤波器。此外,需要强调的是,新颖的 sincNet 模型非常适合识别儿童的年龄。在朗读和自发语音任务中对说话者的识别、与性别无关的&与年龄组相关的儿童识别等方面的研究结果都优于基线模型,在准确率方面也有不同程度的相对提高。
{"title":"Automatic speaker and age identification of children from raw speech using sincNet over ERB scale","authors":"Kodali Radha ,&nbsp;Mohan Bansal ,&nbsp;Ram Bilas Pachori","doi":"10.1016/j.specom.2024.103069","DOIUrl":"https://doi.org/10.1016/j.specom.2024.103069","url":null,"abstract":"<div><p>This paper presents the newly developed non-native children’s English speech (NNCES) corpus to reveal the findings of automatic speaker and age recognition from raw speech. Convolutional neural networks (CNN), which have the ability to learn low-level speech representations, can be fed directly with raw speech signals instead of using traditional hand-crafted features. Moreover, the filters that were learned using standard CNNs appeared to be noisy because they consider all elements of each filter. In contrast, sincNet can be able to generate more meaningful filters simply by replacing the first convolutional layer by a sinc-layer in standard CNNs. The low and high cutoff frequencies of the rectangular band-pass filter are the only parameters that can be learned in sincNet, which has the potential to extract significant speech cues from the speaker, such as pitch and formants. In this work, the sincNet model is significantly changed by switching from baseline Mel scale initializations to equivalent rectangular bandwidth (ERB) initializations, which has the added benefit of allocating additional filters in the lower region of the spectrum. Additionally, it needs to be highlighted that the novel sincNet model is well suited to identify the age of the children. The investigations on both read and spontaneous speech tasks in speaker identification, gender independent &amp; dependent age-group identification of children outperform the baseline models with varying relative improvements in terms of accuracy.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140547364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The effect of musical expertise on whistled vowel identification 音乐专业知识对口哨元音识别的影响
IF 3.2 3区 计算机科学 Q1 Arts and Humanities Pub Date : 2024-03-17 DOI: 10.1016/j.specom.2024.103058
Anaïs Tran Ngoc , Julien Meyer , Fanny Meunier

In this paper, we looked at the impact of musical experience on whistled vowel categorization by native French speakers. Whistled speech, a natural, yet modified speech type, augments speech amplitude while transposing the signal to a range of fairly high frequencies, i.e. 1 to 4 kHz. The whistled vowels are simple pitches of different heights depending on the vowel position, and generally represent the most stable part of the signal, just as in modal speech. They are modulated by consonant coarticulation(s), resulting in characteristic pitch movements. This change in speech mode can liken the speech signal to musical notes and their modulations; however, the mechanisms used to categorize whistled phonemes rely on abstract phonological knowledge and representation. Here we explore the impact of musical expertise on such a process by focusing on four whistled vowels (/i, e, a, o/) which have been used in previous experiments with non-musicians. We also included inter-speaker production variations, adding variability to the vowel pitches. Our results showed that all participants categorize whistled vowels well over chance, with musicians showing advantages for the middle whistled vowels (/a/ and /e/) as well as for the lower whistled vowel /o/. The whistler variability also affects musicians more than non-musicians and impacts their advantage, notably for the vowels /e/ and /o/. However, we find no specific training advantage for musicians over the whole experiment, but rather training effects for /a/ and /e/ when taking into account all participants. This suggests that though musical experience may help structure the vowel hierarchy when the whistler has a larger range, this advantage cannot be generalized when listening to another whistler. Thus, the transfer of musical knowledge present in this task only influences certain aspects of speech perception.

在本文中,我们研究了音乐经验对母语为法语的人进行口哨元音分类的影响。口哨语音是一种自然但经过修饰的语音类型,它在增强语音振幅的同时,将信号转调到相当高的频率范围,即 1 至 4 千赫兹。啸叫元音是简单的音高,根据元音位置的不同而有不同的高度,通常代表信号中最稳定的部分,就像模态语音一样。它们受到辅音共同发音的调制,从而产生特有的音高变化。语音模式的这种变化可将语音信号比作音符及其变调;然而,用于对口哨音素进行分类的机制依赖于抽象的语音知识和表征。在此,我们将重点放在四个啸叫元音(/i、e、a、o/)上,探讨音乐专业知识对这一过程的影响。我们还加入了说话者之间的发音变化,增加了元音音高的可变性。我们的结果表明,所有参与者对啸叫元音的分类都优于偶然情况,音乐家对中间啸叫元音(/a/和/e/)以及低啸叫元音/o/的分类更有优势。与非音乐家相比,吹口哨者的可变性对音乐家的影响更大,也影响了他们的优势,尤其是在元音/e/和/o/方面。然而,我们发现音乐家并没有特定的学习优势,但在考虑到所有参与者的情况下,/a/和/e/的学习效果反而更好。这表明,虽然当吹口哨者的音域较大时,音乐经验可能有助于构建元音层次结构,但当听另一位吹口哨者吹口哨时,这种优势并不能普遍化。因此,这项任务中的音乐知识迁移只会影响语音感知的某些方面。
{"title":"The effect of musical expertise on whistled vowel identification","authors":"Anaïs Tran Ngoc ,&nbsp;Julien Meyer ,&nbsp;Fanny Meunier","doi":"10.1016/j.specom.2024.103058","DOIUrl":"10.1016/j.specom.2024.103058","url":null,"abstract":"<div><p>In this paper, we looked at the impact of musical experience on whistled vowel categorization by native French speakers. Whistled speech, a natural, yet modified speech type, augments speech amplitude while transposing the signal to a range of fairly high frequencies, i.e. 1 to 4 kHz. The whistled vowels are simple pitches of different heights depending on the vowel position, and generally represent the most stable part of the signal, just as in modal speech. They are modulated by consonant coarticulation(s), resulting in characteristic pitch movements. This change in speech mode can liken the speech signal to musical notes and their modulations; however, the mechanisms used to categorize whistled phonemes rely on abstract phonological knowledge and representation. Here we explore the impact of musical expertise on such a process by focusing on four whistled vowels (/i, e, a, o/) which have been used in previous experiments with non-musicians. We also included inter-speaker production variations, adding variability to the vowel pitches. Our results showed that all participants categorize whistled vowels well over chance, with musicians showing advantages for the middle whistled vowels (/a/ and /e/) as well as for the lower whistled vowel /o/. The whistler variability also affects musicians more than non-musicians and impacts their advantage, notably for the vowels /e/ and /o/. However, we find no specific training advantage for musicians over the whole experiment, but rather training effects for /a/ and /e/ when taking into account all participants. This suggests that though musical experience may help structure the vowel hierarchy when the whistler has a larger range, this advantage cannot be generalized when listening to another whistler. Thus, the transfer of musical knowledge present in this task only influences certain aspects of speech perception.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140200797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Symmetric and asymmetric Gaussian weighted linear prediction for voice inverse filtering 用于语音反向滤波的对称和非对称高斯加权线性预测
IF 3.2 3区 计算机科学 Q1 Arts and Humanities Pub Date : 2024-03-06 DOI: 10.1016/j.specom.2024.103057
I.A. Zalazar, G.A. Alzamendi, G. Schlotthauer

Weighted linear prediction (WLP) has demonstrated its significance in voice inverse filtering, contributing to enhanced methods for estimating both the vocal tract filter and the glottal source. WLP provides a mechanism to mitigate the effect on the linear prediction model of voice samples that affects the vocal tract filter estimation, particularly those samples around glottal closure instants (GCIs). This article studies the Gaussian weighted linear prediction (GLP) strategy, which employs a Gaussian attenuation window centered at the GCIs to reduce its contribution in the WLP analysis. In this study, the Gaussian attenuation is revisited and a parameterization of the window that adjusts to the typical variability in voice periodicity is introduced. In addition, an asymmetric Gaussian window is proposed to diminish the relevance of voice samples preceding GCIs on the WLP model, thus providing a quasi closed phase inverse filtering method. Characterization of symmetric and asymmetric GLP methods for glottal source estimation is addressed based on synthetic and natural phonation data, resulting in a set of optimal parameters for the Gaussian attenuation windows. The results show that the proposed asymmetric attenuation improves voice inverse filtering with respect to the symmetric GLP method. Comparisons with other state-of-the-art techniques suggest that the proposed GLP approaches are competitive, falling slightly short in performance only when contrasted with the well-known quasi closed inverse filtering analysis. The simplicity of implementing the attenuation windows, coupled with their robust performance, positions the proposed GLP methods as two attractive and straightforward voice inverse filtering techniques for practical application.

加权线性预测(WLP)在语音反向滤波中发挥了重要作用,有助于改进声道滤波器和声门源的估算方法。WLP 提供了一种机制,可减轻影响声带滤波器估计的语音样本对线性预测模型的影响,尤其是声门闭合时刻(GCI)附近的样本。本文研究的高斯加权线性预测(GLP)策略采用了以 GCIs 为中心的高斯衰减窗口,以减少其在 WLP 分析中的贡献。本研究对高斯衰减进行了重新探讨,并引入了可根据语音周期性的典型变化进行调整的窗口参数化。此外,还提出了一种非对称高斯窗口,以降低 GCI 之前的语音样本对 WLP 模型的相关性,从而提供一种准封闭相位反滤波方法。在合成和自然发音数据的基础上,对用于声门源估计的对称和非对称 GLP 方法进行了特性分析,从而得出了一组高斯衰减窗口的最佳参数。结果表明,与对称 GLP 方法相比,所提出的非对称衰减改进了语音反滤波。与其他最先进技术的比较表明,所提出的 GLP 方法很有竞争力,只有在与著名的准封闭式反滤波分析相比时,性能才略有不足。衰减窗口的实施简单,加上其稳健的性能,使所提出的 GLP 方法成为实际应用中两种极具吸引力的直接语音反滤波技术。
{"title":"Symmetric and asymmetric Gaussian weighted linear prediction for voice inverse filtering","authors":"I.A. Zalazar,&nbsp;G.A. Alzamendi,&nbsp;G. Schlotthauer","doi":"10.1016/j.specom.2024.103057","DOIUrl":"10.1016/j.specom.2024.103057","url":null,"abstract":"<div><p>Weighted linear prediction (WLP) has demonstrated its significance in voice inverse filtering, contributing to enhanced methods for estimating both the vocal tract filter and the glottal source. WLP provides a mechanism to mitigate the effect on the linear prediction model of voice samples that affects the vocal tract filter estimation, particularly those samples around glottal closure instants (GCIs). This article studies the Gaussian weighted linear prediction (GLP) strategy, which employs a Gaussian attenuation window centered at the GCIs to reduce its contribution in the WLP analysis. In this study, the Gaussian attenuation is revisited and a parameterization of the window that adjusts to the typical variability in voice periodicity is introduced. In addition, an asymmetric Gaussian window is proposed to diminish the relevance of voice samples preceding GCIs on the WLP model, thus providing a quasi closed phase inverse filtering method. Characterization of symmetric and asymmetric GLP methods for glottal source estimation is addressed based on synthetic and natural phonation data, resulting in a set of optimal parameters for the Gaussian attenuation windows. The results show that the proposed asymmetric attenuation improves voice inverse filtering with respect to the symmetric GLP method. Comparisons with other state-of-the-art techniques suggest that the proposed GLP approaches are competitive, falling slightly short in performance only when contrasted with the well-known quasi closed inverse filtering analysis. The simplicity of implementing the attenuation windows, coupled with their robust performance, positions the proposed GLP methods as two attractive and straightforward voice inverse filtering techniques for practical application.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140072950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Language fusion via adapters for low-resource speech recognition 通过适配器进行语言融合,实现低资源语音识别
IF 3.2 3区 计算机科学 Q1 Arts and Humanities Pub Date : 2024-03-01 DOI: 10.1016/j.specom.2024.103037
Qing Hu, Yan Zhang, Xianlei Zhang, Zongyu Han, Xiuxia Liang

Data scarcity makes low-resource speech recognition systems suffer from severe overfitting. Although fine-tuning addresses this issue to some extent, it leads to parameter-inefficient training. In this paper, a novel language knowledge fusion method, named LanFusion, is proposed. It is built on the recent popular adapter-tuning technique, thus maintaining better parameter efficiency compared with conventional fine-tuning methods. LanFusion is a two-stage method. Specifically, multiple adapters are first trained on several source languages to extract language-specific and language-invariant knowledge. Then, the trained adapters are re-trained on the target low-resource language to fuse the learned knowledge. Compared with Vanilla-adapter, LanFusion obtains a relative average word error rate (WER) reduction of 9.8% and 8.6% on the Common Voice and FLEURS corpora, respectively. Extensive experiments demonstrate the proposed method is not only simple and effective but also parameter-efficient. Besides, using source languages that are geographically similar to the target language yields better results on both datasets.

数据稀缺使低资源语音识别系统遭受严重的过拟合。虽然微调在一定程度上解决了这一问题,但却导致训练参数效率低下。本文提出了一种名为 LanFusion 的新型语言知识融合方法。该方法基于最近流行的适配器调整技术,因此与传统的微调方法相比,能保持更好的参数效率。LanFusion 是一种两阶段方法。具体来说,首先在几种源语言上训练多个适配器,以提取特定语言和语言不变知识。然后,在目标低资源语言上重新训练经过训练的适配器,以融合所学知识。与 Vanilla-adapter 相比,LanFusion 在 Common Voice 和 FLEURS 语料库中的平均词错误率 (WER) 分别降低了 9.8% 和 8.6%。大量实验证明,所提出的方法不仅简单有效,而且参数效率高。此外,使用与目标语言地理位置相似的源语言在两个数据集上都能获得更好的结果。
{"title":"Language fusion via adapters for low-resource speech recognition","authors":"Qing Hu,&nbsp;Yan Zhang,&nbsp;Xianlei Zhang,&nbsp;Zongyu Han,&nbsp;Xiuxia Liang","doi":"10.1016/j.specom.2024.103037","DOIUrl":"https://doi.org/10.1016/j.specom.2024.103037","url":null,"abstract":"<div><p>Data scarcity makes low-resource speech recognition systems suffer from severe overfitting. Although fine-tuning addresses this issue to some extent, it leads to parameter-inefficient training. In this paper, a novel language knowledge fusion method, named LanFusion, is proposed. It is built on the recent popular adapter-tuning technique, thus maintaining better parameter efficiency compared with conventional fine-tuning methods. LanFusion is a two-stage method. Specifically, multiple adapters are first trained on several source languages to extract language-specific and language-invariant knowledge. Then, the trained adapters are re-trained on the target low-resource language to fuse the learned knowledge. Compared with Vanilla-adapter, LanFusion obtains a relative average word error rate (WER) reduction of 9.8% and 8.6% on the Common Voice and FLEURS corpora, respectively. Extensive experiments demonstrate the proposed method is not only simple and effective but also parameter-efficient. Besides, using source languages that are geographically similar to the target language yields better results on both datasets.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140030117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A distortionless convolution beamformer design method based on the weighted minimum mean square error for joint dereverberation and denoising 基于加权最小均方误差的无失真卷积波束成形器设计方法,用于联合消除混响和去噪
IF 3.2 3区 计算机科学 Q1 Arts and Humanities Pub Date : 2024-03-01 DOI: 10.1016/j.specom.2024.103054
Jing Zhou, Changchun Bao, Maoshen Jia, Wenmeng Xiong

This paper designs a weighted minimum mean square error (WMMSE) based distortionless convolution beamformer (DCBF) for joint dereverberation and denoising. By effectively using WMMSE with the constraint of distortionless, a DCBF is deduced, where the outputs of the weighted prediction error (WPE) filter and the WPE-based minimum variance distortionless response (MVDR) beamformer are combined to initialize target signal for balancing signal distortion, residual reverberation and residual noise. In addition, two optimization factors are introduced to reduce the reverberation and noise when the initialized target signal is used for the solution of beamformer. As a result, the designed beamformer is presented as a linear combination of the WMMSE-based convolution beamformer (CBF) and weighted power minimization distortionless response (WPD) filter. The experimental results demonstrate the superior performance of the designed beamformer for joint dereverberation and denoising compared to the reference methods.

本文设计了一种基于加权最小均方误差(WMMSE)的无失真卷积波束成形器(DCBF),用于联合消除混响和去噪。通过有效利用 WMMSE 和无失真约束,推导出一种 DCBF,将加权预测误差(WPE)滤波器和基于 WPE 的最小方差无失真响应(MVDR)波束成形器的输出结合起来,对目标信号进行初始化,以平衡信号失真、残余混响和残余噪声。此外,在将初始化目标信号用于波束成形器求解时,还引入了两个优化因子来降低混响和噪声。因此,所设计的波束成形器是基于 WMMSE 的卷积波束成形器(CBF)和加权功率最小化无失真响应滤波器(WPD)的线性组合。实验结果表明,与参考方法相比,所设计的波束成形器在联合去混响和去噪方面性能优越。
{"title":"A distortionless convolution beamformer design method based on the weighted minimum mean square error for joint dereverberation and denoising","authors":"Jing Zhou,&nbsp;Changchun Bao,&nbsp;Maoshen Jia,&nbsp;Wenmeng Xiong","doi":"10.1016/j.specom.2024.103054","DOIUrl":"https://doi.org/10.1016/j.specom.2024.103054","url":null,"abstract":"<div><p>This paper designs a weighted minimum mean square error (WMMSE) based distortionless convolution beamformer (DCBF) for joint dereverberation and denoising. By effectively using WMMSE with the constraint of distortionless, a DCBF is deduced, where the outputs of the weighted prediction error (WPE) filter and the WPE-based minimum variance distortionless response (MVDR) beamformer are combined to initialize target signal for balancing signal distortion, residual reverberation and residual noise. In addition, two optimization factors are introduced to reduce the reverberation and noise when the initialized target signal is used for the solution of beamformer. As a result, the designed beamformer is presented as a linear combination of the WMMSE-based convolution beamformer (CBF) and weighted power minimization distortionless response (WPD) filter. The experimental results demonstrate the superior performance of the designed beamformer for joint dereverberation and denoising compared to the reference methods.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139999875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Speech Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1