首页 > 最新文献

Speech Communication最新文献

英文 中文
Speech emotion recognition approaches: A systematic review 语音情感识别方法:系统综述
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2023-10-01 DOI: 10.1016/j.specom.2023.102974
Ahlam Hashem, Muhammad Arif, Manal Alghamdi

The speech emotion recognition (SER) field has been active since it became a crucial feature in advanced Human–Computer Interaction (HCI), and wide real-life applications use it. In recent years, numerous SER systems have been covered by researchers, including the availability of appropriate emotional databases, selecting robustness features, and applying suitable classifiers using Machine Learning (ML) and Deep Learning (DL). Deep models proved to perform more accurately for SER than conventional ML techniques. Nevertheless, SER is yet challenging for classification where to separate similar emotional patterns; it needs a highly discriminative feature representation. For this purpose, this survey aims to critically analyze what is being done in this field of research in light of previous studies that aim to recognize emotions using speech audio in different aspects and review the current state of SER using DL. Through a systematic literature review whereby searching selected keywords from 2012–2022, 96 papers were extracted and covered the most current findings and directions. Specifically, we covered the database (acted, evoked, and natural) and features (prosodic, spectral, voice quality, and teager energy operator), the necessary preprocessing steps. Furthermore, different DL models and their performance are examined in depth. Based on our review, we also suggested SER aspects that could be considered in the future.

语音情感识别(SER)领域自成为高级人机交互(HCI)的一个重要特征以来一直活跃,并在现实生活中得到广泛应用。近年来,研究人员已经研究了许多SER系统,包括适当的情感数据库的可用性,选择鲁棒性特征,以及使用机器学习(ML)和深度学习(DL)应用合适的分类器。深度模型被证明比传统的机器学习技术更准确地执行SER。然而,SER在分类中仍然具有挑战性,在哪里分离相似的情感模式;它需要一个高度判别的特征表示。为此,本调查旨在根据先前旨在利用语音音频从不同方面识别情绪的研究,批判性地分析这一领域的研究进展,并回顾使用深度学习的SER的现状。通过检索2012-2022年的关键词进行系统的文献综述,提取出96篇论文,涵盖了最新的研究发现和研究方向。具体来说,我们涵盖了数据库(动作的、诱发的和自然的)和特征(韵律、频谱、语音质量和能量算子),以及必要的预处理步骤。此外,还对不同的深度学习模型及其性能进行了深入研究。在回顾的基础上,我们还提出了未来可以考虑的SER方面。
{"title":"Speech emotion recognition approaches: A systematic review","authors":"Ahlam Hashem,&nbsp;Muhammad Arif,&nbsp;Manal Alghamdi","doi":"10.1016/j.specom.2023.102974","DOIUrl":"10.1016/j.specom.2023.102974","url":null,"abstract":"<div><p>The speech emotion recognition (SER) field has been active since it became a crucial feature in advanced Human–Computer Interaction (HCI), and wide real-life applications use it. In recent years, numerous SER systems have been covered by researchers, including the availability of appropriate emotional databases, selecting robustness features, and applying suitable classifiers using Machine Learning (ML) and Deep Learning (DL). Deep models proved to perform more accurately for SER than conventional ML techniques. Nevertheless, SER is yet challenging for classification where to separate similar emotional patterns; it needs a highly discriminative feature representation. For this purpose, this survey aims to critically analyze what is being done in this field of research in light of previous studies that aim to recognize emotions using speech audio in different aspects and review the current state of SER using DL. Through a systematic literature review whereby searching selected keywords from 2012–2022, 96 papers were extracted and covered the most current findings and directions. Specifically, we covered the database (acted, evoked, and natural) and features (prosodic, spectral, voice quality, and teager energy operator), the necessary preprocessing steps. Furthermore, different DL models and their performance are examined in depth. Based on our review, we also suggested SER aspects that could be considered in the future.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"154 ","pages":"Article 102974"},"PeriodicalIF":3.2,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48935263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Acoustic properties of non-native clear speech: Korean speakers of English 非母语清晰语音的声学特性:讲英语的韩国人
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2023-10-01 DOI: 10.1016/j.specom.2023.102982
Ye-Jee Jung, Olga Dmitrieva

The present study examined the acoustic properties of clear speech produced by non-native speakers of English (L1 Korean), in comparison to native clear speech. L1 Korean speakers of English (N=30) and native speakers of English (N=20) read an English word-list in casual and clear speaking styles. Analysis included clear speech correlates thought to be universal across languages (vowel space expansion and stressed vowel lengthening) and clear speech modifications believed to function in a languages-specific way (increased mean voice fundamental frequency and fo range and acoustic enhancement of English voicing contrast via voice onset time and onset fo). The results showed that across the two groups of participants clear speech was acoustically distinct from casual speech in every parameter. In addition, the presence and direction of the acoustic difference between the two speaking styles was largely the same for both native and non-native speakers. Nevertheless, non-native clear speech differed from native one in the magnitude of acoustic modifications. Specifically, L2 speakers implemented less vowel space expansion, less increase of mean fo, and less positive and negative VOT lengthening in clear speech than native speakers. Many, but not all differences between native and non-native clear speech could be attributed to the effect of participants’ L1, for example, lower functional load of VOT in the differentiation of Korean laryngeal categories and absence of pitch-dependent system of prosodic prominence in Korean. Overall, we conclude that Korean speakers were successful in producing native-like English clear speech, although the observed deviations in the extent of acoustic modifications suggest some reliance on L1-specific prosodic system and system of phonetic implementation and enhancement of phonological contrasts.

本研究考察了非英语母语者(L1韩语)产生的清晰语音与母语清晰语音的声学特性。母语为韩语的英语使用者(N=30)和母语为英语的使用者(N=20)以随意清晰的说话风格阅读英语单词表。分析包括被认为在不同语言中普遍存在的清晰语音相关性(元音空间扩展和重音元音延长)和被认为以特定语言方式发挥作用的清晰语音修饰(增加平均语音基频和fo范围,以及通过语音起始时间和起始fo增强英语语音对比度)。结果表明,在两组参与者中,清晰的言语在每个参数上都与随意的言语在声学上不同。此外,对于母语和非母语的人来说,两种说话风格之间声学差异的存在和方向基本相同。然而,非母语清晰语音与母语清晰语音在声学修饰程度上有所不同。具体而言,与母语人士相比,二语使用者在清晰语音中实现了更少的元音空间扩展、更少的平均值fo增加以及更少的正负VOT延长。母语和非母语清晰语音之间的许多(但不是全部)差异可归因于参与者的L1的影响,例如,在韩语喉类的分化中,VOT的功能负荷较低,以及在韩语中缺乏音调依赖的韵律突出系统。总的来说,我们得出的结论是,说韩语的人成功地产生了类似母语英语的清晰语音,尽管观察到的声学修饰程度的偏差表明,他们在一定程度上依赖于L1特定的韵律系统和语音实现系统,并增强了语音对比。
{"title":"Acoustic properties of non-native clear speech: Korean speakers of English","authors":"Ye-Jee Jung,&nbsp;Olga Dmitrieva","doi":"10.1016/j.specom.2023.102982","DOIUrl":"https://doi.org/10.1016/j.specom.2023.102982","url":null,"abstract":"<div><p>The present study examined the acoustic properties of clear speech produced by non-native speakers of English (L1 Korean), in comparison to native clear speech. L1 Korean speakers of English (<em>N</em>=30) and native speakers of English (N=20) read an English word-list in casual and clear speaking styles. Analysis included clear speech correlates thought to be universal across languages (vowel space expansion and stressed vowel lengthening) and clear speech modifications believed to function in a languages-specific way (increased mean voice fundamental frequency and <em>f</em><sub>o</sub> range and acoustic enhancement of English voicing contrast via voice onset time and onset <em>f</em><sub>o</sub>). The results showed that across the two groups of participants clear speech was acoustically distinct from casual speech in every parameter. In addition, the presence and direction of the acoustic difference between the two speaking styles was largely the same for both native and non-native speakers. Nevertheless, non-native clear speech differed from native one in the magnitude of acoustic modifications. Specifically, L2 speakers implemented less vowel space expansion, less increase of mean <em>f</em><sub>o</sub>, and less positive and negative VOT lengthening in clear speech than native speakers. Many, but not all differences between native and non-native clear speech could be attributed to the effect of participants’ L1, for example, lower functional load of VOT in the differentiation of Korean laryngeal categories and absence of pitch-dependent system of prosodic prominence in Korean. Overall, we conclude that Korean speakers were successful in producing native-like English clear speech, although the observed deviations in the extent of acoustic modifications suggest some reliance on L1-specific prosodic system and system of phonetic implementation and enhancement of phonological contrasts.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"154 ","pages":"Article 102982"},"PeriodicalIF":3.2,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49906043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DNN controlled adaptive front-end for replay attack detection systems 重放攻击检测系统的DNN控制自适应前端
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2023-10-01 DOI: 10.1016/j.specom.2023.102973
Buddhi Wickramasinghe , Eliathamby Ambikairajah , Vidhyasaharan Sethu , Julien Epps , Haizhou Li , Ting Dang

Developing robust countermeasures to protect automatic speaker verification systems against replay spoofing attacks is a well-recognized challenge. Current approaches to spoofing detection are generally based on a fixed front-end, typically a time-invariant filter bank, followed by a machine learning back-end. In this paper, we propose a novel approach whereby the front-end comprises an adaptive filter bank with a deep neural network-based controller, which is jointly trained along with a neural network back-end. Specifically, the deep neural network-based adaptive filter controller tunes the selectivity and sensitivity of the front-end filter bank at every frame to capture replay-related artefacts. We demonstrate the effectiveness of the proposed framework in spoofing attack detection on a synthesized dataset and ASVSpoof 2019 and ASVSpoof 2021 challenge datasets in terms of equal error rate and its ability to capture artefacts that differentiate replayed signals from genuine ones in comparison to conventional non-adaptive front-ends.

开发强大的对策来保护自动说话人验证系统免受重放欺骗攻击是一个公认的挑战。当前的欺骗检测方法通常基于固定的前端,通常是一个定常滤波器组,然后是一个机器学习后端。在本文中,我们提出了一种新颖的方法,其中前端包括一个具有基于深度神经网络的控制器的自适应滤波器组,该控制器与神经网络后端共同训练。具体来说,基于深度神经网络的自适应滤波器控制器在每一帧调整前端滤波器组的选择性和灵敏度,以捕获重播相关的伪影。我们证明了所提出的框架在合成数据集和ASVSpoof 2019和ASVSpoof 2021挑战数据集上欺骗攻击检测的有效性,其误码率相等,并且与传统的非自适应前端相比,它能够捕获将重放信号与真实信号区分开来的伪信号。
{"title":"DNN controlled adaptive front-end for replay attack detection systems","authors":"Buddhi Wickramasinghe ,&nbsp;Eliathamby Ambikairajah ,&nbsp;Vidhyasaharan Sethu ,&nbsp;Julien Epps ,&nbsp;Haizhou Li ,&nbsp;Ting Dang","doi":"10.1016/j.specom.2023.102973","DOIUrl":"10.1016/j.specom.2023.102973","url":null,"abstract":"<div><p>Developing robust countermeasures to protect automatic speaker verification systems against replay spoofing attacks is a well-recognized challenge. Current approaches to spoofing detection are generally based on a fixed front-end, typically a time-invariant filter bank, followed by a machine learning back-end. In this paper, we propose a novel approach whereby the front-end comprises an adaptive filter bank with a deep neural network-based controller, which is jointly trained along with a neural network back-end. Specifically, the deep neural network-based adaptive filter controller tunes the selectivity and sensitivity of the front-end filter bank at every frame to capture replay-related artefacts. We demonstrate the effectiveness of the proposed framework in spoofing attack detection on a synthesized dataset and ASVSpoof 2019 and ASVSpoof 2021 challenge datasets in terms of equal error rate and its ability to capture artefacts that differentiate replayed signals from genuine ones in comparison to conventional non-adaptive front-ends.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"154 ","pages":"Article 102973"},"PeriodicalIF":3.2,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43240641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model predictive PESQ-ANFIS/FUZZY C-MEANS for image-based speech signal evaluation 基于图像的语音信号评价模型预测PESQ-ANFIS/FUZZY C-MEANS
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2023-10-01 DOI: 10.1016/j.specom.2023.102972
Eder Pereira Neves , Marco Aparecido Queiroz Duarte , Jozue Vieira Filho , Caio Cesar Enside de Abreu , Bruno Rodrigues de Oliveira

This paper presents a new method to evaluate the quality of speech signals through images generated from a psychoacoustic model to estimate PESQ (ITU-T P862) values using a first-order Fuzzy Sugeno approach implemented in the Adaptive Neuro-Fuzzy Inference System - ANFIS. The factors feeding the network were obtained using an image-processing technique from the perceptual model coefficients. All simulations were performed using a database containing clean and corrupted signals by eight types of noises found in everyday situations. The proposal uses the PESQ values of the signals to train the network. The analyses proved that the predictive performance will depend on the choice of a psychoacoustic model, the factor extraction technique, the combination of these factors, the fuzzification algorithm, and the type of membership function in the ANFIS input space. The data sets for training and testing for each signal directory were randomly created and executed fifty times. The proposal achieves the best prediction values for PESQ when the averages of the measurements reach MAPE 0.09, RMSE 0.20, and R295. In general, the approach provided satisfactory results compared to Multilayer Perceptron networks with their different learning algorithms, compared to another psychoacoustic model, to ITU-T P.563 and other non-intrusive methods that evaluate the quality of voice signals, and it was efficient regardless of the number of signals and the database used.

本文提出了一种新的方法,通过心理声学模型生成的图像来评估语音信号的质量,以使用自适应神经模糊推理系统ANFIS中实现的一阶模糊Sugeno方法来估计PESQ(ITU-T P862)值。使用图像处理技术从感知模型系数中获得馈送网络的因素。所有模拟都是使用一个数据库进行的,该数据库包含日常情况下发现的八种类型的噪声产生的干净和损坏的信号。该方案使用信号的PESQ值来训练网络。分析证明,预测性能将取决于心理声学模型的选择、因素提取技术、这些因素的组合、模糊化算法以及ANFIS输入空间中的隶属函数类型。用于每个信号目录的训练和测试的数据集被随机创建并执行50次。当测量的平均值达到MAPE≤0.09、RMSE≤0.20和R2≥95时,该方案实现了PESQ的最佳预测值。一般来说,与具有不同学习算法的多层感知器网络相比,与另一心理声学模型相比,与ITU-T P.563和其他评估语音信号质量的非侵入性方法相比,该方法提供了令人满意的结果,并且无论信号数量和所用数据库如何,该方法都是有效的。
{"title":"Model predictive PESQ-ANFIS/FUZZY C-MEANS for image-based speech signal evaluation","authors":"Eder Pereira Neves ,&nbsp;Marco Aparecido Queiroz Duarte ,&nbsp;Jozue Vieira Filho ,&nbsp;Caio Cesar Enside de Abreu ,&nbsp;Bruno Rodrigues de Oliveira","doi":"10.1016/j.specom.2023.102972","DOIUrl":"https://doi.org/10.1016/j.specom.2023.102972","url":null,"abstract":"<div><p>This paper presents a new method to evaluate the quality of speech signals through images generated from a psychoacoustic model to estimate PESQ (ITU-T P862) values using a first-order Fuzzy Sugeno approach implemented in the Adaptive Neuro-Fuzzy Inference System - ANFIS. The factors feeding the network were obtained using an image-processing technique from the perceptual model coefficients. All simulations were performed using a database containing clean and corrupted signals by eight types of noises found in everyday situations. The proposal uses the PESQ values of the signals to train the network. The analyses proved that the predictive performance will depend on the choice of a psychoacoustic model, the factor extraction technique, the combination of these factors, the fuzzification algorithm, and the type of membership function in the ANFIS input space. The data sets for training and testing for each signal directory were randomly created and executed fifty times. The proposal achieves the best prediction values for PESQ when the averages of the measurements reach MAPE <span><math><mrow><mo>≤</mo><mn>0</mn><mo>.</mo><mn>09</mn></mrow></math></span>, RMSE <span><math><mrow><mo>≤</mo><mn>0</mn><mo>.</mo><mn>20</mn></mrow></math></span>, and <span><math><mrow><msup><mrow><mtext>R</mtext></mrow><mrow><mn>2</mn></mrow></msup><mo>≥</mo><mn>95</mn></mrow></math></span>. In general, the approach provided satisfactory results compared to Multilayer Perceptron networks with their different learning algorithms, compared to another psychoacoustic model, to ITU-T P.563 and other non-intrusive methods that evaluate the quality of voice signals, and it was efficient regardless of the number of signals and the database used.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"154 ","pages":"Article 102972"},"PeriodicalIF":3.2,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49906042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Determining spectral stability in vowels: A comparison and assessment of different metrics 确定元音的谱稳定性:不同度量的比较和评估
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2023-10-01 DOI: 10.1016/j.specom.2023.102984
Jérémy Genette , Jose Manuel Rivera Espejo , Steven Gillis , Jo Verhoeven

This study investigated the performance of several metrics used to evaluate spectral stability in vowels. Four metrics suggested in the literature and a newly developed one were tested and compared to the traditional method of associating the spectrally stable portion with the middle of the vowel. First, synthetic stimuli whose spectrally stable portion had been defined in advance were used to evaluate the potential of the different metrics to capture spectral stability. Second, the output of the different metrics on the acoustic measurements obtained in the vowel portions identified as spectrally stable was compared on both synthesized and natural speech. It is clear that higher-dimensional features are needed to capture spectral stability and that the best-performing metrics yield acoustic measurements that are similar to those obtained in the middle of the vowel. This study empirically validates long-standing intuitions about the validity of selecting the middle section of vowels as the preferred method to identify the spectrally stable region in vowels.

本研究调查了用于评估元音频谱稳定性的几种指标的性能。对文献中提出的四个度量标准和一个新开发的度量标准进行了测试,并与将频谱稳定部分与元音中间部分关联的传统方法进行了比较。首先,使用预先定义了光谱稳定部分的合成刺激来评估不同指标捕捉光谱稳定性的潜力。其次,在合成语音和自然语音上比较在被识别为频谱稳定的元音部分中获得的声学测量的不同度量的输出。很明显,需要更高维度的特征来捕捉频谱稳定性,并且最佳性能的度量产生与在元音中间获得的声学测量类似的声学测量。本研究实证验证了长期以来关于选择元音中间部分作为识别元音频谱稳定区域的首选方法的有效性的直觉。
{"title":"Determining spectral stability in vowels: A comparison and assessment of different metrics","authors":"Jérémy Genette ,&nbsp;Jose Manuel Rivera Espejo ,&nbsp;Steven Gillis ,&nbsp;Jo Verhoeven","doi":"10.1016/j.specom.2023.102984","DOIUrl":"https://doi.org/10.1016/j.specom.2023.102984","url":null,"abstract":"<div><p>This study investigated the performance of several metrics used to evaluate spectral stability in vowels. Four metrics suggested in the literature and a newly developed one were tested and compared to the traditional method of associating the spectrally stable portion with the middle of the vowel. First, synthetic stimuli whose spectrally stable portion had been defined in advance were used to evaluate the potential of the different metrics to capture spectral stability. Second, the output of the different metrics on the acoustic measurements obtained in the vowel portions identified as spectrally stable was compared on both synthesized and natural speech. It is clear that higher-dimensional features are needed to capture spectral stability and that the best-performing metrics yield acoustic measurements that are similar to those obtained in the middle of the vowel. This study empirically validates long-standing intuitions about the validity of selecting the middle section of vowels as the preferred method to identify the spectrally stable region in vowels.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"154 ","pages":"Article 102984"},"PeriodicalIF":3.2,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49906040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Subband fusion of complex spectrogram for fake speech detection 基于复谱图子带融合的假语音检测
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2023-09-29 DOI: 10.1016/j.specom.2023.102988
Cunhang Fan , Jun Xue , Shunbo Dong , Mingming Ding , Jiangyan Yi , Jinpeng Li , Zhao Lv

The phase information was shown useful in fake speech detection. However, the most common reason why phase-based features are not widely used is phase wrapping. This makes the original phase hard to model directly. Therefore, it remains a challenge how to utilize the phase information effectively. To address this issue, this paper proposes a novel subband fusion of the complex spectrogram method for fake speech detection. The complex spectrogram is used as the input feature, containing both amplitude and phase spectrogram. In addition, subbands of the complex spectrogram are modeled separately. The idea is motivated by the fact that each frequency band has a different effect on the fake speech detection task. Finally, to make full use of the subbands, the subband results are fused. Experimental results on the ASVspoof 2019 LA dataset show that our proposed system achieves an equal error rate (EER) of 0.68% and a minimum tandem detection cost function (min t-DCF) of 0.0224.

相位信息被证明在伪语音检测中是有用的。然而,基于相位的特征没有被广泛使用的最常见原因是相位包裹。这使得原始阶段很难直接建模。因此,如何有效地利用相位信息仍然是一个挑战。针对这一问题,本文提出了一种新的子带融合复谱图的伪语音检测方法。复数谱图被用作输入特征,包含幅度谱图和相位谱图。此外,复谱图的子带被单独建模。这个想法的动机是,每个频带对假语音检测任务的影响不同。最后,为了充分利用子带,对子带结果进行了融合。在ASVspoof 2019 LA数据集上的实验结果表明,我们提出的系统实现了0.68%的等误差率(EER)和0.0224的最小串联检测成本函数(min-t-DCF)。
{"title":"Subband fusion of complex spectrogram for fake speech detection","authors":"Cunhang Fan ,&nbsp;Jun Xue ,&nbsp;Shunbo Dong ,&nbsp;Mingming Ding ,&nbsp;Jiangyan Yi ,&nbsp;Jinpeng Li ,&nbsp;Zhao Lv","doi":"10.1016/j.specom.2023.102988","DOIUrl":"https://doi.org/10.1016/j.specom.2023.102988","url":null,"abstract":"<div><p>The phase information was shown useful in fake speech detection. However, the most common reason why phase-based features are not widely used is phase wrapping. This makes the original phase hard to model directly. Therefore, it remains a challenge how to utilize the phase information effectively. To address this issue, this paper proposes a novel subband fusion of the complex spectrogram method for fake speech detection. The complex spectrogram is used as the input feature, containing both amplitude and phase spectrogram. In addition, subbands of the complex spectrogram are modeled separately. The idea is motivated by the fact that each frequency band has a different effect on the fake speech detection task. Finally, to make full use of the subbands, the subband results are fused. Experimental results on the ASVspoof 2019 LA dataset show that our proposed system achieves an equal error rate (EER) of 0.68% and a minimum tandem detection cost function (min t-DCF) of 0.0224.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"155 ","pages":"Article 102988"},"PeriodicalIF":3.2,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49701213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Post-processing automatic transcriptions with machine learning for verbal fluency scoring 后处理自动转录与机器学习的口头流畅性评分
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2023-09-27 DOI: 10.1016/j.specom.2023.102990
Justin Bushnell , Frederick Unverzagt , Virginia G. Wadley , Richard Kennedy , John Del Gaizo , David Glenn Clark

Objective

To compare verbal fluency scores derived from manual transcriptions to those obtained using automatic speech recognition enhanced with machine learning classifiers.

Methods

Using Amazon Web Services, we automatically transcribed verbal fluency recordings from 1400 individuals who performed both animal and letter F verbal fluency tasks. We manually adjusted timings and contents of the automatic transcriptions to obtain “gold standard” transcriptions. To make automatic scoring possible, we trained machine learning classifiers to discern between valid and invalid utterances. We then calculated and compared verbal fluency scores from the manual and automatic transcriptions.

Results

For both animal and letter fluency tasks, we achieved good separation of valid versus invalid utterances. Verbal fluency scores calculated based on automatic transcriptions showed high correlation with those calculated after manual correction.

Conclusion

Many techniques for scoring verbal fluency word lists require accurate transcriptions with word timings. We show that machine learning methods can be applied to improve off-the-shelf ASR for this purpose. These automatically derived scores may be satisfactory for some applications. Low correlations among some of the scores indicate the need for improvement in automatic speech recognition before a fully automatic approach can be reliably implemented.

目的比较人工转录和机器学习分类器增强的自动语音识别的语言流利度得分。方法使用亚马逊网络服务,我们自动转录了1400名同时执行动物和字母F语言流利性任务的人的语言流利性记录。我们手动调整了自动转录的时间和内容,以获得“金标准”转录。为了使自动评分成为可能,我们训练机器学习分类器来区分有效和无效的话语。然后,我们计算并比较了手动和自动转录的语言流利度分数。结果在动物和字母流利性任务中,我们都能很好地分离出有效和无效的话语。基于自动转录计算的语言流利度分数与手动更正后计算的分数具有高度相关性。结论许多语言流利度单词表评分技术都需要准确的转录和单词计时。我们表明,机器学习方法可以用于改进现成的ASR。这些自动导出的分数对于某些应用来说可能是令人满意的。一些分数之间的低相关性表明,在可以可靠地实现全自动方法之前,需要改进自动语音识别。
{"title":"Post-processing automatic transcriptions with machine learning for verbal fluency scoring","authors":"Justin Bushnell ,&nbsp;Frederick Unverzagt ,&nbsp;Virginia G. Wadley ,&nbsp;Richard Kennedy ,&nbsp;John Del Gaizo ,&nbsp;David Glenn Clark","doi":"10.1016/j.specom.2023.102990","DOIUrl":"https://doi.org/10.1016/j.specom.2023.102990","url":null,"abstract":"<div><h3>Objective</h3><p>To compare verbal fluency scores derived from manual transcriptions to those obtained using automatic speech recognition enhanced with machine learning classifiers.</p></div><div><h3>Methods</h3><p>Using Amazon Web Services, we automatically transcribed verbal fluency recordings from 1400 individuals who performed both animal and letter F verbal fluency tasks. We manually adjusted timings and contents of the automatic transcriptions to obtain “gold standard” transcriptions. To make automatic scoring possible, we trained machine learning classifiers to discern between valid and invalid utterances. We then calculated and compared verbal fluency scores from the manual and automatic transcriptions.</p></div><div><h3>Results</h3><p>For both animal and letter fluency tasks, we achieved good separation of valid versus invalid utterances. Verbal fluency scores calculated based on automatic transcriptions showed high correlation with those calculated after manual correction.</p></div><div><h3>Conclusion</h3><p>Many techniques for scoring verbal fluency word lists require accurate transcriptions with word timings. We show that machine learning methods can be applied to improve off-the-shelf ASR for this purpose. These automatically derived scores may be satisfactory for some applications. Low correlations among some of the scores indicate the need for improvement in automatic speech recognition before a fully automatic approach can be reliably implemented.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"155 ","pages":"Article 102990"},"PeriodicalIF":3.2,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49701215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chirplet transform based time frequency analysis of speech signal for automated speech emotion recognition 基于小波变换的语音信号时频分析用于语音情感自动识别
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2023-09-23 DOI: 10.1016/j.specom.2023.102986
Siba Prasad Mishra, Pankaj Warule, Suman Deb

Nowadays, the recognition of emotion using the speech signal has gained popularity because of its vast number of applications in different fields like medicine, online marketing, online search engines, the education system, criminal investigations, traffic collisions, etc. Many researchers have adopted different methodologies to improve emotion classification accuracy using speech signals. In our study, time–frequency (TF) analysis-based features were used to analyze the emotion classification performance. We used a novel TF analysis method called the chirplet transform (CT) to find the TF matrix of the speech signal. We then calculated the proposed TF-based permutation entropy (TFPE) feature using the TF matrix of the speech signal. To reduce the feature dimension and select the most informative emotional feature, we employed the genetic algorithm (GA) feature selection method. Then, the selected TFPE features are used as input to machine learning classifiers such as SVM, RF, DT, and KNN to classify the emotions in the speech signal. We obtained classification accuracy of 77.2%, 69.57%, 68.78%, 56.9%, and 99.1% for the EMO-DB, EMOVO, SAVEE, IEMOCAP, and TESS datasets without the GA feature selection method. The emotion classification accuracy increased to 85.6%, 78.33%, 77.76%, 63.15%, and 100% with the GA feature selection method. We compared our results with other methods and found that our method performed better in emotion classification than the state-of-the-art methods.

如今,利用语音信号识别情绪已经受到欢迎,因为它在医学、在线营销、在线搜索引擎、教育系统、刑事调查、交通碰撞等不同领域有着广泛的应用。许多研究人员采用了不同的方法来提高利用语音信号进行情绪分类的准确性。在我们的研究中,使用基于时间-频率(TF)分析的特征来分析情绪分类性能。我们使用了一种新的TF分析方法,称为啁啾变换(CT)来找到语音信号的TF矩阵。然后,我们使用语音信号的TF矩阵来计算所提出的基于TF的排列熵(TFPE)特征。为了降低特征维数并选择信息量最大的情感特征,我们采用了遗传算法(GA)的特征选择方法。然后,将所选择的TFPE特征用作机器学习分类器(如SVM、RF、DT和KNN)的输入,以对语音信号中的情绪进行分类。在没有GA特征选择方法的情况下,我们对EMO-DB、EMOVO、SAVEE、IEMOCAP和TESS数据集的分类准确率分别为77.2%、69.57%、68.78%、56.9%和99.1%。采用GA特征选择方法,情绪分类准确率分别提高到85.6%、78.33%、77.76%、63.15%和100%。我们将我们的结果与其他方法进行了比较,发现我们的方法在情绪分类方面比最先进的方法表现得更好。
{"title":"Chirplet transform based time frequency analysis of speech signal for automated speech emotion recognition","authors":"Siba Prasad Mishra,&nbsp;Pankaj Warule,&nbsp;Suman Deb","doi":"10.1016/j.specom.2023.102986","DOIUrl":"https://doi.org/10.1016/j.specom.2023.102986","url":null,"abstract":"<div><p>Nowadays, the recognition of emotion using the speech signal has gained popularity because of its vast number of applications in different fields like medicine, online marketing, online search engines, the education system, criminal investigations, traffic collisions, etc. Many researchers have adopted different methodologies to improve emotion classification accuracy using speech signals. In our study, time–frequency (TF) analysis-based features were used to analyze the emotion classification performance. We used a novel TF analysis method called the chirplet transform (CT) to find the TF matrix of the speech signal. We then calculated the proposed TF-based permutation entropy (TFPE) feature using the TF matrix of the speech signal. To reduce the feature dimension and select the most informative emotional feature, we employed the genetic algorithm (GA) feature selection method. Then, the selected TFPE features are used as input to machine learning classifiers such as SVM, RF, DT, and KNN to classify the emotions in the speech signal. We obtained classification accuracy of 77.2%, 69.57%, 68.78%, 56.9%, and 99.1% for the EMO-DB, EMOVO, SAVEE, IEMOCAP, and TESS datasets without the GA feature selection method. The emotion classification accuracy increased to 85.6%, 78.33%, 77.76%, 63.15%, and 100% with the GA feature selection method. We compared our results with other methods and found that our method performed better in emotion classification than the state-of-the-art methods.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"155 ","pages":"Article 102986"},"PeriodicalIF":3.2,"publicationDate":"2023-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49701010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CAST: Context-association architecture with simulated long-utterance training for mandarin speech recognition 中文语音识别的语境关联架构与模拟长话语训练
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2023-09-22 DOI: 10.1016/j.specom.2023.102985
Yue Ming, Boyang Lyu, Zerui Li

End-to-end (E2E) models are widely used because they significantly improve the performance of automatic speech recognition (ASR). However, based on the limitations of existing hardware computing devices, previous studies mainly focus on short utterances. Typically, utterances used for ASR training do not last much longer than 15 s, and therefore the models often fail to generalize to longer utterances at inference time. To address the challenge of long-form speech recognition, we propose a novel Context-Association Architecture with Simulated Long-utterance Training (CAST), which consists of a Context-Association RNN-Transducer (CARNN-T) and a simulating long utterance training (SLUT) strategy. The CARNN-T obtains the sentence-level contextual information by paying attention to the cross-sentence historical utterances and adds it in the inference stage, which improves the robustness of long-form speech recognition. The SLUT strategy simulates long-form audio training by updating the recursive state, which can alleviate the length mismatch between training and testing utterances. Experiments on the test of the Aishell-1 and aidatatang_200zh synthetic corpora show that our model has the best recognition performer on long utterances with the character error rate (CER) of 12.0%/12.6%, respectively.

端到端(E2E)模型被广泛使用,因为它们显著提高了自动语音识别(ASR)的性能。然而,基于现有硬件计算设备的局限性,以往的研究主要集中在短句上。通常,用于ASR训练的话语不会持续超过15s,因此模型在推理时往往无法推广到更长的话语。为了应对长形式语音识别的挑战,我们提出了一种新的具有模拟长话语训练的上下文关联架构(CAST),该架构由上下文关联RNN转换器(CARNN-T)和模拟长话语培训(SLUT)策略组成。CARNN-T通过关注跨句历史话语来获得句子级上下文信息,并在推理阶段将其添加,提高了长格式语音识别的鲁棒性。SLUT策略通过更新递归状态来模拟长格式音频训练,可以缓解训练和测试话语之间的长度不匹配。对Aishell-1和Aidatatang200zh合成语料库的测试表明,我们的模型在长话语识别方面表现最好,字符错误率分别为12.0%/12.6%。
{"title":"CAST: Context-association architecture with simulated long-utterance training for mandarin speech recognition","authors":"Yue Ming,&nbsp;Boyang Lyu,&nbsp;Zerui Li","doi":"10.1016/j.specom.2023.102985","DOIUrl":"https://doi.org/10.1016/j.specom.2023.102985","url":null,"abstract":"<div><p>End-to-end (E2E) models are widely used because they significantly improve the performance of automatic speech recognition (ASR). However, based on the limitations of existing hardware computing devices, previous studies mainly focus on short utterances. Typically, utterances used for ASR training do not last much longer than 15 s, and therefore the models often fail to generalize to longer utterances at inference time. To address the challenge of long-form speech recognition, we propose a novel Context-Association Architecture with Simulated Long-utterance Training (CAST), which consists of a Context-Association RNN-Transducer (CARNN-T) and a simulating long utterance training (SLUT) strategy. The CARNN-T obtains the sentence-level contextual information by paying attention to the cross-sentence historical utterances and adds it in the inference stage, which improves the robustness of long-form speech recognition. The SLUT strategy simulates long-form audio training by updating the recursive state, which can alleviate the length mismatch between training and testing utterances. Experiments on the test of the Aishell-1 and aidatatang_200zh synthetic corpora show that our model has the best recognition performer on long utterances with the character error rate (CER) of 12.0%/12.6%, respectively.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"155 ","pages":"Article 102985"},"PeriodicalIF":3.2,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49701013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing Levenshtein distance and dynamic time warping in predicting listeners’ judgments of accent distance 比较Levenshtein距离与动态时间扭曲对听者口音距离判断的预测
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2023-09-21 DOI: 10.1016/j.specom.2023.102987
Holly C. Lind-Combs , Tessa Bent , Rachael F. Holt , Cynthia G. Clopper , Emma Brown

Listeners attend to variation in segmental and prosodic cues when judging accent strength. The relative contributions of these cues to perceptions of accentedness in English remains open for investigation, although objective accent distance measures (such as Levenshtein distance) appear to be reliable tools for predicting perceptual distance. Levenshtein distance, however, only accounts for phonemic information in the signal. The purpose of the current study was to examine the relative contributions of phonemic (Levenshtein) and holistic acoustic (dynamic time warping) distances from the local accent to listeners’ accent rankings for nine non-local native and nonnative accents. Listeners (n = 52) ranked talkers on perceived distance from the local accent (Midland American English) using a ladder task for three sentence-length stimuli. Phonemic and holistic acoustic distances between Midland American English and the other accents were quantified using both weighted and unweighted Levenshtein distance measures, and dynamic time warping (DTW). Results reveal that all three metrics contribute to perceived accent distance, with the weighted Levenshtein slightly outperforming the other measures. Moreover, the relative contribution of phonemic and holistic acoustic cues was driven by the speaker's accent. Both nonnative and non-local native accents were included in this study, and the benefits of considering both of these accent groups in studying phonemic and acoustic cues used by listeners is discussed.

在判断重音强度时,听众会注意到分段和韵律线索的变化。尽管客观的重音距离测量(如Levenstein距离)似乎是预测感知距离的可靠工具,但这些线索对英语重音感知的相对贡献仍有待研究。然而,Levenstein距离只解释了信号中的音位信息。本研究的目的是检验当地口音的音位(Levenstein)和整体声学(动态时间扭曲)距离对九种非当地本地和非本地口音的听众口音排名的相对贡献。听众(n=52)使用三个句子长度刺激的阶梯任务,根据与当地口音(米德兰美式英语)的感知距离对说话者进行排名。使用加权和未加权的Levenstein距离测量以及动态时间扭曲(DTW)来量化米德兰美式英语和其他口音之间的音位和整体声学距离。结果表明,这三个指标都有助于感知重音距离,加权的Levenstein略优于其他指标。此外,音位和整体声学线索的相对贡献是由说话者的口音驱动的。本研究包括非本地和非本地口音,并讨论了在研究听众使用的音位和声学线索时同时考虑这两种口音组的好处。
{"title":"Comparing Levenshtein distance and dynamic time warping in predicting listeners’ judgments of accent distance","authors":"Holly C. Lind-Combs ,&nbsp;Tessa Bent ,&nbsp;Rachael F. Holt ,&nbsp;Cynthia G. Clopper ,&nbsp;Emma Brown","doi":"10.1016/j.specom.2023.102987","DOIUrl":"https://doi.org/10.1016/j.specom.2023.102987","url":null,"abstract":"<div><p>Listeners attend to variation in segmental and prosodic cues when judging accent strength. The relative contributions of these cues to perceptions of accentedness in English remains open for investigation, although objective accent distance measures (such as Levenshtein distance) appear to be reliable tools for predicting perceptual distance. Levenshtein distance, however, only accounts for phonemic information in the signal. The purpose of the current study was to examine the relative contributions of phonemic (Levenshtein) and holistic acoustic (dynamic time warping) distances from the local accent to listeners’ accent rankings for nine non-local native and nonnative accents. Listeners (<em>n</em> = 52) ranked talkers on perceived distance from the local accent (Midland American English) using a ladder task for three sentence-length stimuli. Phonemic and holistic acoustic distances between Midland American English and the other accents were quantified using both weighted and unweighted Levenshtein distance measures, and dynamic time warping (DTW). Results reveal that all three metrics contribute to perceived accent distance, with the weighted Levenshtein slightly outperforming the other measures. Moreover, the relative contribution of phonemic and holistic acoustic cues was driven by the speaker's accent. Both nonnative and non-local native accents were included in this study, and the benefits of considering both of these accent groups in studying phonemic and acoustic cues used by listeners is discussed.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"155 ","pages":"Article 102987"},"PeriodicalIF":3.2,"publicationDate":"2023-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49702800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Speech Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1