首页 > 最新文献

Speech Communication最新文献

英文 中文
Deletion and insertion tampering detection for speech authentication based on fluctuating super vector of electrical network frequency 基于电网频率波动超矢量的语音认证删除和插入篡改检测
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-02-12 DOI: 10.1016/j.specom.2024.103046
Chunyan Zeng , Shuai Kong , Zhifeng Wang , Shixiong Feng , Nan Zhao , Juan Wang

The current digital speech deletion and insertion tampering detection methods mainly employes the extraction of phase and frequency features of the Electrical Network Frequency (ENF). However, there are some problems with the existing approaches, such as the alignment problem for speech samples with different durations, the sparsity of ENF features, and the small number of tampered speech samples for training, which lead to low accuracy of deletion and insertion tampering detection. Therefore, this paper proposes a tampering detection method for digital speech deletion and insertion based on ENF Fluctuation Super-vector (ENF-FSV) and deep feature learning representation. By extracting the parameters of ENF phase and frequency fitting curves, feature alignment and dimensionality reduction are achieved, and the alignment and sparsity problems are avoided while the fluctuation information of phase and frequency is extracted. To solve the problem of the insufficient sample size of tampered speech for training, the ENF Universal Background Model (ENF-UBM) is built by a large number of untampered speech samples, and the mean vector is updated to extract ENF-FSV. Considering the shallow representation of ENF features with not highlighting important features, we construct an end-to-end deep neural network to strengthen the attention to the abrupt fluctuation information by the attention mechanism to enhance the representational power of the ENF-FSV features, and then the deep ENF-FSV features extracted by the Residual Network (ResNet) module are fed to the designed classification network for tampering detection. The experimental results show that the method in this paper exhibits higher accuracy and better robustness in the Carioca, New Spanish, and ENF High-sampling Group (ENF-HG) databases when compared with the state-of-the-art methods.

目前的数字语音删除和插入篡改检测方法主要采用电网络频率(ENF)的相位和频率特性提取。然而,现有方法存在一些问题,如不同时长语音样本的对齐问题、ENF 特征的稀疏性、用于训练的篡改语音样本数量较少等,导致删除和插入篡改检测的准确率较低。因此,本文提出了一种基于ENF波动超向量(ENF-FSV)和深度特征学习表示的数字语音删插篡改检测方法。通过提取ENF相位和频率拟合曲线参数,实现了特征对齐和降维,在提取相位和频率波动信息的同时,避免了对齐和稀疏性问题。为解决训练时篡改语音样本量不足的问题,利用大量未篡改语音样本建立 ENF 通用背景模型(ENF-UBM),并更新均值向量以提取 ENF-FSV。考虑到ENF特征的表征较浅,无法突出重要特征,我们构建了端到端的深度神经网络,通过注意力机制加强对突变波动信息的关注,增强ENF-FSV特征的表征力,然后将残差网络(ResNet)模块提取的ENF-FSV深度特征反馈给设计的分类网络,进行篡改检测。实验结果表明,与最先进的方法相比,本文的方法在 Carioca、New Spanish 和 ENF 高采样组(ENF-HG)数据库中表现出更高的准确性和更好的鲁棒性。
{"title":"Deletion and insertion tampering detection for speech authentication based on fluctuating super vector of electrical network frequency","authors":"Chunyan Zeng ,&nbsp;Shuai Kong ,&nbsp;Zhifeng Wang ,&nbsp;Shixiong Feng ,&nbsp;Nan Zhao ,&nbsp;Juan Wang","doi":"10.1016/j.specom.2024.103046","DOIUrl":"https://doi.org/10.1016/j.specom.2024.103046","url":null,"abstract":"<div><p>The current digital speech deletion and insertion tampering detection methods mainly employes the extraction of phase and frequency features of the Electrical Network Frequency (ENF). However, there are some problems with the existing approaches, such as the alignment problem for speech samples with different durations, the sparsity of ENF features, and the small number of tampered speech samples for training, which lead to low accuracy of deletion and insertion tampering detection. Therefore, this paper proposes a tampering detection method for digital speech deletion and insertion based on ENF Fluctuation Super-vector (ENF-FSV) and deep feature learning representation. By extracting the parameters of ENF phase and frequency fitting curves, feature alignment and dimensionality reduction are achieved, and the alignment and sparsity problems are avoided while the fluctuation information of phase and frequency is extracted. To solve the problem of the insufficient sample size of tampered speech for training, the ENF Universal Background Model (ENF-UBM) is built by a large number of untampered speech samples, and the mean vector is updated to extract ENF-FSV. Considering the shallow representation of ENF features with not highlighting important features, we construct an end-to-end deep neural network to strengthen the attention to the abrupt fluctuation information by the attention mechanism to enhance the representational power of the ENF-FSV features, and then the deep ENF-FSV features extracted by the Residual Network (ResNet) module are fed to the designed classification network for tampering detection. The experimental results show that the method in this paper exhibits higher accuracy and better robustness in the Carioca, New Spanish, and ENF High-sampling Group (ENF-HG) databases when compared with the state-of-the-art methods.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"158 ","pages":"Article 103046"},"PeriodicalIF":3.2,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139725832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Some properties of mental speech preparation as revealed by self-monitoring 自我监控揭示的心理演讲准备的一些特性
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-02-09 DOI: 10.1016/j.specom.2024.103043
Hugo Quené, Sieb G. Nooteboom

The main goal of this paper is to improve our insight in the mental preparation of speech, based on speakers' self-monitoring behavior. To this end we re-analyze the aggregated responses from earlier published experiments eliciting speech sound errors. The re-analyses confirm or show that (1) “early” and “late” detections of elicited speech sound errors can be distinguished, with a time delay in the order of 500 ms; (2) a main cause for some errors to be detected “early”, others “late” and others again not at all is the size of the phonetic contrast between the error and the target speech sound; (3) repairs of speech sound errors stem from competing (and sometimes active) word candidates. These findings lead to some speculative conclusions regarding the mental preparation of speech. First, there are two successive stages of mental preparation, an “early” and a “late” stage. Second, at the “early” stage of speech preparation, speech sounds are represented as targets in auditory perceptual space, at the “late” stage as coordinated motor commands necessary for articulation. Third, repairs of speech sound errors stem from response candidates competing for the same slot with the error form, and some activation often is sustained until after articulation.

本文的主要目的是根据说话者的自我监控行为,提高我们对语音心理准备的洞察力。为此,我们重新分析了早先发表的诱发语音错误实验中的综合反应。重新分析证实或表明:(1) 对语音错误的 "早期 "和 "晚期 "检测是可以区分的,时间延迟在 500 毫秒左右;(2) 有些错误被 "早期 "检测到,有些则被 "晚期 "检测到,有些则完全没有被检测到,其主要原因在于错误与目标语音之间的语音对比度大小;(3) 修复语音错误源于相互竞争的(有时是主动的)候选词。这些发现为语音的心理准备提供了一些推测性结论。首先,心理准备有两个连续的阶段,即 "早期 "和 "晚期 "阶段。其次,在语音准备的 "早期 "阶段,语音在听觉知觉空间中表现为目标,而在 "晚期 "阶段则表现为发音所需的协调运动指令。第三,语音错误的修复源于候选反应与错误形式争夺同一位置,有些激活往往持续到发音之后。
{"title":"Some properties of mental speech preparation as revealed by self-monitoring","authors":"Hugo Quené,&nbsp;Sieb G. Nooteboom","doi":"10.1016/j.specom.2024.103043","DOIUrl":"https://doi.org/10.1016/j.specom.2024.103043","url":null,"abstract":"<div><p>The main goal of this paper is to improve our insight in the mental preparation of speech, based on speakers' self-monitoring behavior. To this end we re-analyze the aggregated responses from earlier published experiments eliciting speech sound errors. The re-analyses confirm or show that (1) “early” and “late” detections of elicited speech sound errors can be distinguished, with a time delay in the order of 500 ms; (2) a main cause for some errors to be detected “early”, others “late” and others again not at all is the size of the phonetic contrast between the error and the target speech sound; (3) repairs of speech sound errors stem from competing (and sometimes active) word candidates. These findings lead to some speculative conclusions regarding the mental preparation of speech. First, there are two successive stages of mental preparation, an “early” and a “late” stage. Second, at the “early” stage of speech preparation, speech sounds are represented as targets in auditory perceptual space, at the “late” stage as coordinated motor commands necessary for articulation. Third, repairs of speech sound errors stem from response candidates competing for the same slot with the error form, and some activation often is sustained until after articulation.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"158 ","pages":"Article 103043"},"PeriodicalIF":3.2,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167639324000153/pdfft?md5=0778601c47d5f7635cc40d5c60526a59&pid=1-s2.0-S0167639324000153-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139738033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Validation of an ECAPA-TDNN system for Forensic Automatic Speaker Recognition under case work conditions 在办案条件下验证用于法证自动语音识别的 ECAPA-TDNN 系统
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-02-09 DOI: 10.1016/j.specom.2024.103045
Francesco Sigona, Mirko Grimaldi

In this work, we tested different variants of a Forensic Automatic Speaker Recognition (FASR) system based on Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN). To this scope, conditions reflecting those of a real forensic voice comparison case have been taken into consideration according to the forensic_eval_01 evaluation campaign settings. Using this recent neural model as an embedding extraction block, various normalization strategies at the level of embeddings and scores allowed us to observe the variations in system performance in terms of discriminating power, accuracy and precision metrics. Our findings suggest that the ECAPA-TDNN can be successfully used as a base component of a FASR system, managing to surpass the previous state of the art, at least in the context of the considered operating conditions.

在这项工作中,我们测试了基于时延神经网络(ECAPA-TDNN)的强调通道注意、传播和聚合的法证自动语音识别(FASR)系统的不同变体。为此,根据 forensic_eval_01 评估活动的设置,考虑了反映真实法证语音比对案例的条件。使用这个最新的神经模型作为嵌入提取块,在嵌入和分数层面采用各种归一化策略,使我们能够观察到系统在辨别力、准确度和精确度指标方面的性能变化。我们的研究结果表明,ECAPA-TDNN 可以成功地用作 FASR 系统的基础组件,至少在所考虑的运行条件下,它能够超越以前的技术水平。
{"title":"Validation of an ECAPA-TDNN system for Forensic Automatic Speaker Recognition under case work conditions","authors":"Francesco Sigona,&nbsp;Mirko Grimaldi","doi":"10.1016/j.specom.2024.103045","DOIUrl":"https://doi.org/10.1016/j.specom.2024.103045","url":null,"abstract":"<div><p>In this work, we tested different variants of a Forensic Automatic Speaker Recognition (FASR) system based on Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN). To this scope, conditions reflecting those of a real forensic voice comparison case have been taken into consideration according to the <em>forensic_eval_01</em> evaluation campaign settings. Using this recent neural model as an embedding extraction block, various normalization strategies at the level of embeddings and scores allowed us to observe the variations in system performance in terms of discriminating power, accuracy and precision metrics. Our findings suggest that the ECAPA-TDNN can be successfully used as a base component of a FASR system, managing to surpass the previous state of the art, at least in the context of the considered operating conditions.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"158 ","pages":"Article 103045"},"PeriodicalIF":3.2,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167639324000177/pdfft?md5=4a1c2390e5be4931eca4de00e7d357e7&pid=1-s2.0-S0167639324000177-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139914548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic classification of neurological voice disorders using wavelet scattering features 利用小波散射特征对神经性嗓音疾病进行自动分类
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-02-01 DOI: 10.1016/j.specom.2024.103040
Madhu Keerthana Yagnavajjula , Kiran Reddy Mittapalle , Paavo Alku , Sreenivasa Rao K. , Pabitra Mitra

Neurological voice disorders are caused by problems in the nervous system as it interacts with the larynx. In this paper, we propose to use wavelet scattering transform (WST)-based features in automatic classification of neurological voice disorders. As a part of WST, a speech signal is processed in stages with each stage consisting of three operations – convolution, modulus and averaging – to generate low-variance data representations that preserve discriminability across classes while minimizing differences within a class. The proposed WST-based features were extracted from speech signals of patients suffering from either spasmodic dysphonia (SD) or recurrent laryngeal nerve palsy (RLNP) and from speech signals of healthy speakers of the Saarbruecken voice disorder (SVD) database. Two machine learning algorithms (support vector machine (SVM) and feed forward neural network (NN)) were trained separately using the WST-based features, to perform two binary classification tasks (healthy vs. SD and healthy vs. RLNP) and one multi-class classification task (healthy vs. SD vs. RLNP). The results show that WST-based features outperformed state-of-the-art features in all three tasks. Furthermore, the best overall classification performance was achieved by the NN classifier trained using WST-based features.

神经性嗓音疾病是由于神经系统与喉部相互作用时出现问题而造成的。在本文中,我们建议在神经性嗓音疾病的自动分类中使用基于小波散射变换(WST)的特征。作为小波散射变换的一部分,语音信号会被分阶段处理,每个阶段包括三次运算--卷积、模数和平均,以生成低方差数据表示,从而保持不同类别之间的可区分性,同时最大限度地减少类别内的差异。所提出的基于 WST 的特征是从痉挛性发音障碍(SD)或喉返神经麻痹(RLNP)患者的语音信号以及萨尔布吕肯语音障碍(SVD)数据库中健康说话者的语音信号中提取的。使用基于 WST 的特征分别训练了两种机器学习算法(支持向量机 (SVM) 和前馈神经网络 (NN)),以完成两项二元分类任务(健康 vs. SD 和健康 vs. RLNP)和一项多类分类任务(健康 vs. SD vs. RLNP)。结果表明,在所有三个任务中,基于 WST 的特征都优于最先进的特征。此外,使用基于 WST 特征训练的 NN 分类器取得了最佳的整体分类性能。
{"title":"Automatic classification of neurological voice disorders using wavelet scattering features","authors":"Madhu Keerthana Yagnavajjula ,&nbsp;Kiran Reddy Mittapalle ,&nbsp;Paavo Alku ,&nbsp;Sreenivasa Rao K. ,&nbsp;Pabitra Mitra","doi":"10.1016/j.specom.2024.103040","DOIUrl":"10.1016/j.specom.2024.103040","url":null,"abstract":"<div><p>Neurological voice disorders are caused by problems in the nervous system as it interacts with the larynx. In this paper, we propose to use wavelet scattering transform (WST)-based features in automatic classification of neurological voice disorders. As a part of WST, a speech signal is processed in stages with each stage consisting of three operations – convolution, modulus and averaging – to generate low-variance data representations that preserve discriminability across classes while minimizing differences within a class. The proposed WST-based features were extracted from speech signals of patients suffering from either spasmodic dysphonia (SD) or recurrent laryngeal nerve palsy (RLNP) and from speech signals of healthy speakers of the Saarbruecken voice disorder (SVD) database. Two machine learning algorithms (support vector machine (SVM) and feed forward neural network (NN)) were trained separately using the WST-based features, to perform two binary classification tasks (healthy vs. SD and healthy vs. RLNP) and one multi-class classification task (healthy vs. SD vs. RLNP). The results show that WST-based features outperformed state-of-the-art features in all three tasks. Furthermore, the best overall classification performance was achieved by the NN classifier trained using WST-based features.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"157 ","pages":"Article 103040"},"PeriodicalIF":3.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167639324000128/pdfft?md5=98a659d5cd3309ac33e76a42084db6ed&pid=1-s2.0-S0167639324000128-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139589964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AVID: A speech database for machine learning studies on vocal intensity AVID:用于声音强度机器学习研究的语音数据库
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-02-01 DOI: 10.1016/j.specom.2024.103039
Paavo Alku , Manila Kodali , Laura Laaksonen , Sudarsana Reddy Kadiri

Vocal intensity, which is quantified typically with the sound pressure level (SPL), is a key feature of speech. To measure SPL from speech recordings, a standard calibration tone (with a reference SPL of 94 dB or 114 dB) needs to be recorded together with speech. However, most of the popular databases that are used in areas such as speech and speaker recognition have been recorded without calibration information by expressing speech on arbitrary amplitude scales. Therefore, information about vocal intensity of the recorded speech, including SPL, is lost. In the current study, we introduce a new open and calibrated speech/electroglottography (EGG) database named Aalto Vocal Intensity Database (AVID). AVID includes speech and EGG produced by 50 speakers (25 males, 25 females) who varied their vocal intensity in four categories (soft, normal, loud and very loud). Recordings were conducted using a constant mouth-to-microphone distance and by recording a calibration tone. The speech data was labelled sentence-wise using a total of 19 labels that support the utilisation of the data in machine learning (ML) -based studies of vocal intensity based on supervised learning. In order to demonstrate how the AVID data can be used to study vocal intensity, we investigated one multi-class classification task (classification of speech into soft, normal, loud and very loud intensity classes) and one regression task (prediction of SPL of speech). In both tasks, we deliberately warped the level of the input speech by normalising the signal to have its maximum amplitude equal to 1.0, that is, we simulated a scenario that is prevalent in current speech databases. The results show that using the spectrogram feature with the support vector machine classifier gave an accuracy of 82% in the multi-class classification of the vocal intensity category. In the prediction of SPL, using the spectrogram feature with the support vector regressor gave an mean absolute error of about 2 dB and a coefficient of determination of 92%. We welcome researchers interested in classification and regression problems to utilise AVID in the study of vocal intensity, and we hope that the current results could serve as baselines for future ML studies on the topic.

语音强度是语音的一个主要特征,通常用声压级 (SPL) 来量化。要测量语音录音的声压级,需要将标准校准音(参考声压级为 94 dB 或 114 dB)与语音一起录制。然而,大多数用于语音和说话人识别等领域的流行数据库都是在没有校准信息的情况下录制的,用任意振幅标度来表达语音。因此,录制语音的声强信息(包括声压级)就丢失了。在当前的研究中,我们引入了一个新的开放式校准语音/电子声门图(EGG)数据库,名为阿尔托声带强度数据库(AVID)。AVID 包括 50 位说话者(25 位男性,25 位女性)的语音和 EGG,他们的声音强度分为四类(轻柔、正常、响亮和非常响亮)。录音时,嘴与麦克风的距离保持不变,并录制校准音。语音数据按句子进行了标注,共使用了 19 个标签,这些标签支持在基于机器学习(ML)的声乐强度研究中使用基于监督学习的数据。为了展示如何利用 AVID 数据研究声乐强度,我们研究了一项多类分类任务(将语音分为柔和、正常、响亮和非常响亮的强度类别)和一项回归任务(预测语音的声压级)。在这两项任务中,我们故意扭曲了输入语音的电平,将信号归一化,使其最大振幅等于 1.0,也就是说,我们模拟了当前语音数据库中普遍存在的情况。结果表明,使用频谱图特征和支持向量机分类器对声音强度进行多类分类的准确率为 82%。在预测声压级时,使用频谱图特征和支持向量回归器得出的平均绝对误差约为 2 dB,决定系数为 92%。我们欢迎对分类和回归问题感兴趣的研究人员在声乐强度研究中使用 AVID,并希望当前的结果可以作为未来有关该主题的 ML 研究的基线。
{"title":"AVID: A speech database for machine learning studies on vocal intensity","authors":"Paavo Alku ,&nbsp;Manila Kodali ,&nbsp;Laura Laaksonen ,&nbsp;Sudarsana Reddy Kadiri","doi":"10.1016/j.specom.2024.103039","DOIUrl":"10.1016/j.specom.2024.103039","url":null,"abstract":"<div><p>Vocal intensity, which is quantified typically with the sound pressure level (SPL), is a key feature of speech. To measure SPL from speech recordings, a standard calibration tone (with a reference SPL of 94 dB or 114 dB) needs to be recorded together with speech. However, most of the popular databases that are used in areas such as speech and speaker recognition have been recorded without calibration information by expressing speech on arbitrary amplitude scales. Therefore, information about vocal intensity of the recorded speech, including SPL, is lost. In the current study, we introduce a new open and calibrated speech/electroglottography (EGG) database named Aalto Vocal Intensity Database (AVID). AVID includes speech and EGG produced by 50 speakers (25 males, 25 females) who varied their vocal intensity in four categories (soft, normal, loud and very loud). Recordings were conducted using a constant mouth-to-microphone distance and by recording a calibration tone. The speech data was labelled sentence-wise using a total of 19 labels that support the utilisation of the data in machine learning (ML) -based studies of vocal intensity based on supervised learning. In order to demonstrate how the AVID data can be used to study vocal intensity, we investigated one multi-class classification task (classification of speech into soft, normal, loud and very loud intensity classes) and one regression task (prediction of SPL of speech). In both tasks, we deliberately warped the level of the input speech by normalising the signal to have its maximum amplitude equal to 1.0, that is, we simulated a scenario that is prevalent in current speech databases. The results show that using the spectrogram feature with the support vector machine classifier gave an accuracy of 82% in the multi-class classification of the vocal intensity category. In the prediction of SPL, using the spectrogram feature with the support vector regressor gave an mean absolute error of about 2 dB and a coefficient of determination of 92%. We welcome researchers interested in classification and regression problems to utilise AVID in the study of vocal intensity, and we hope that the current results could serve as baselines for future ML studies on the topic.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"157 ","pages":"Article 103039"},"PeriodicalIF":3.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167639324000116/pdfft?md5=c116ec551b37da3e4f4867e6d11803ea&pid=1-s2.0-S0167639324000116-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139560424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Monophthong vocal tract shapes are sufficient for articulatory synthesis of German primary diphthongs 单音声道形状足以发音合成德语初级双元音
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-02-01 DOI: 10.1016/j.specom.2024.103041
Simon Stone, Peter Birkholz

German primary diphthongs are conventionally transcribed using the same symbols used for some monophthong vowels. However, if the corresponding vocal tract shapes are used for articulatory synthesis, the results often sound unnatural. Furthermore, there is no clear consensus in the literature if diphthongs have monopthong constituents and if so, which ones. This study therefore analyzed a set of audio recordings from the reference speaker of the state-of-the-art articulatory synthesizer VocalTractLab to identify likely candidates for the monophthong constituents of the German primary diphthongs. We then evaluated these candidates in a listening experiment with naive listeners to determine a naturalness ranking of these candidates and specialized diphthong shapes. The results showed that the German primary diphthongs can indeed be synthesized with no significant loss in naturalness by replacing the specialized diphthong shapes for the initial and final segments by shapes also used for monopthong vowels.

德语的主要双元音通常使用与某些单元音相同的符号进行转写。然而,如果使用相应的声道形状进行发音合成,结果往往听起来不自然。此外,双元音是否有单音成分,如果有,是哪些成分,文献中还没有明确的共识。因此,本研究分析了一组来自最先进的发音合成器 VocalTractLab 的参考发言人的录音,以确定德语主要双元音的单音成分的可能候选。然后,我们在天真听者的听力实验中对这些候选成分进行了评估,以确定这些候选成分和专门双元音形状的自然度排名。结果表明,德语初级双元音确实可以通过用同样用于单元音的形状来代替首段和尾段的专用双元音形状来合成,而且自然度不会有明显的下降。
{"title":"Monophthong vocal tract shapes are sufficient for articulatory synthesis of German primary diphthongs","authors":"Simon Stone,&nbsp;Peter Birkholz","doi":"10.1016/j.specom.2024.103041","DOIUrl":"10.1016/j.specom.2024.103041","url":null,"abstract":"<div><p><span>German primary diphthongs are conventionally transcribed using the same symbols used for some monophthong vowels. However, if the corresponding vocal tract shapes are used for articulatory synthesis, the results often sound unnatural. Furthermore, there is no clear consensus in the literature if diphthongs have monopthong constituents and if so, which ones. This study therefore analyzed a set of audio recordings from the reference speaker of the state-of-the-art articulatory synthesizer VocalTractLab to identify likely candidates for the monophthong constituents of the German primary diphthongs. We then evaluated these candidates in a listening experiment with naive listeners to determine a </span>naturalness ranking of these candidates and specialized diphthong shapes. The results showed that the German primary diphthongs can indeed be synthesized with no significant loss in naturalness by replacing the specialized diphthong shapes for the initial and final segments by shapes also used for monopthong vowels.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"157 ","pages":"Article 103041"},"PeriodicalIF":3.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139589966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The impact of non-native English speakers’ phonological and prosodic features on automatic speech recognition accuracy 非英语母语者的语音和拟声特征对自动语音识别准确率的影响
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-01-13 DOI: 10.1016/j.specom.2024.103038
Ingy Farouk Emara , Nabil Hamdy Shaker

The present study examines the impact of Arab speakers’ phonological and prosodic features on the accuracy of automatic speech recognition (ASR) of non-native English speech. The authors first investigated the perceptions of 30 Egyptian ESL teachers and 70 Egyptian university students towards the L1 (Arabic)-based errors affecting intelligibility and then carried out a data analysis of the ASR of the students’ English speech to find out whether the errors investigated resulted in intelligibility breakdowns in an ASR setting. In terms of the phonological features of non-native speech, the results showed that the teachers gave more weight to pronunciation features of accented speech that did not actually hinder recognition, that the students were mostly oblivious to the L2 errors they made and their impact on intelligibility, and that L2 errors which were not perceived as serious by both teachers and students had negative impacts on ASR accuracy levels. In regard to the prosodic features of non-native speech, it was found that lower speech rates resulted in more accurate speech recognition levels, higher speech intensity led to less deletion errors, and voice pitch did not seem to have any impact on ASR accuracy levels. The study, accordingly, recommends training ASR systems with more non-native data to increase their accuracy levels as well as paying more attention to remedying non-native speakers’ L1-based errors that are more likely to impact non-native automatic speech recognition.

本研究探讨了阿拉伯语者的语音和前音特征对非母语英语语音自动语音识别(ASR)准确性的影响。作者首先调查了 30 名埃及 ESL 教师和 70 名埃及大学生对影响可懂度的基于 L1(阿拉伯语)的错误的看法,然后对学生的英语语音自动识别进行了数据分析,以了解所调查的错误是否会在自动语音识别环境中导致可懂度下降。在非母语语音的语音特征方面,研究结果表明,教师更重视实际上并不妨碍识别的重音语音的发音特征;学生大多忽视他们所犯的 L2 错误及其对可懂度的影响;教师和学生都认为不严重的 L2 错误对 ASR 准确度水平有负面影响。关于非母语语音的前音特征,研究发现,较低的语速会导致更高的语音识别准确率,较高的语音强度会导致较少的删除错误,而声调似乎对 ASR 的准确率水平没有任何影响。因此,该研究建议使用更多的非母语数据对自动语音识别系统进行培训,以提高其准确度,同时更加关注纠正非母语人士基于 L1 的错误,因为这些错误更有可能影响非母语自动语音识别。
{"title":"The impact of non-native English speakers’ phonological and prosodic features on automatic speech recognition accuracy","authors":"Ingy Farouk Emara ,&nbsp;Nabil Hamdy Shaker","doi":"10.1016/j.specom.2024.103038","DOIUrl":"10.1016/j.specom.2024.103038","url":null,"abstract":"<div><p>The present study examines the impact of Arab speakers’ phonological and prosodic features on the accuracy of automatic speech recognition (ASR) of non-native English speech. The authors first investigated the perceptions of 30 Egyptian ESL teachers and 70 Egyptian university students towards the L1 (Arabic)-based errors affecting intelligibility and then carried out a data analysis of the ASR of the students’ English speech to find out whether the errors investigated resulted in intelligibility breakdowns in an ASR setting. In terms of the phonological features of non-native speech, the results showed that the teachers gave more weight to pronunciation features of accented speech that did not actually hinder recognition, that the students were mostly oblivious to the L2 errors they made and their impact on intelligibility, and that L2 errors which were not perceived as serious by both teachers and students had negative impacts on ASR accuracy levels. In regard to the prosodic features of non-native speech, it was found that lower speech rates resulted in more accurate speech recognition levels, higher speech intensity led to less deletion errors, and voice pitch did not seem to have any impact on ASR accuracy levels. The study, accordingly, recommends training ASR systems with more non-native data to increase their accuracy levels as well as paying more attention to remedying non-native speakers’ L1-based errors that are more likely to impact non-native automatic speech recognition.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"157 ","pages":"Article 103038"},"PeriodicalIF":3.2,"publicationDate":"2024-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139461501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep temporal clustering features for speech emotion recognition 用于语音情感识别的深度时空聚类特征
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2024-01-02 DOI: 10.1016/j.specom.2023.103027
Wei-Cheng Lin, Carlos Busso

Deep clustering is a popular unsupervised technique for feature representation learning. We recently proposed the chunk-based DeepEmoCluster framework for speech emotion recognition (SER) to adopt the concept of deep clustering as a novel semi-supervised learning (SSL) framework, which achieved improved recognition performances over conventional reconstruction-based approaches. However, the vanilla DeepEmoCluster lacks critical sentence-level temporal information that is useful for SER tasks. This study builds upon the DeepEmoCluster framework, creating a powerful SSL approach that leverages temporal information within a sentence. We propose two sentence-level temporal modeling alternatives using either the temporal-net or the triplet loss function, resulting in a novel temporal-enhanced DeepEmoCluster framework to capture essential temporal information. The key contribution to achieving this goal is the proposed sentence-level uniform sampling strategy, which preserves the original temporal order of the data for the clustering process. An extra network module (e.g., gated recurrent unit) is utilized for the temporal-net option to encode temporal information across the data chunks. Alternatively, we can impose additional temporal constraints by using the triplet loss function while training the DeepEmoCluster framework, which does not increase model complexity. Our experimental results based on the MSP-Podcast corpus demonstrate that the proposed temporal-enhanced framework significantly outperforms the vanilla DeepEmoCluster framework and other existing SSL approaches in regression tasks for the emotional attributes arousal, dominance, and valence. The improvements are observed in fully-supervised learning or SSL implementations. Further analyses validate the effectiveness of the proposed temporal modeling, showing (1) high temporal consistency in the cluster assignment, and (2) well-separated emotional patterns in the generated clusters.

深度聚类是一种流行的无监督特征表示学习技术。我们最近为语音情感识别(SER)提出了基于块的 DeepEmoCluster 框架,将深度聚类的概念作为一种新型的半监督学习(SSL)框架,与传统的基于重构的方法相比,该框架提高了识别性能。然而,虚构的 DeepEmoCluster 缺乏对 SER 任务有用的关键句子级时间信息。本研究以 DeepEmoCluster 框架为基础,创建了一种利用句子中时间信息的强大 SSL 方法。我们提出了两种句子级时态建模方案,分别使用时态网或三元组损失函数,从而形成了一种新颖的时态增强 DeepEmoCluster 框架,以捕捉重要的时态信息。实现这一目标的关键贡献在于所提出的句子级统一采样策略,该策略在聚类过程中保留了数据的原始时序。在时序网选项中使用了一个额外的网络模块(如门控递归单元),以编码跨数据块的时序信息。另外,我们还可以在训练 DeepEmoCluster 框架时使用三重损失函数来施加额外的时间约束,这不会增加模型的复杂性。我们基于 MSP-Podcast 语料库的实验结果表明,在情绪属性唤醒度、支配度和价值度的回归任务中,所提出的时序增强框架明显优于普通 DeepEmoCluster 框架和其他现有的 SSL 方法。这些改进在完全监督学习或 SSL 实现中均可观察到。进一步的分析验证了所提出的时间建模的有效性,显示出:(1)聚类分配具有高度的时间一致性;(2)生成的聚类中的情绪模式具有良好的分离性。
{"title":"Deep temporal clustering features for speech emotion recognition","authors":"Wei-Cheng Lin,&nbsp;Carlos Busso","doi":"10.1016/j.specom.2023.103027","DOIUrl":"10.1016/j.specom.2023.103027","url":null,"abstract":"<div><p>Deep clustering is a popular unsupervised technique for feature representation learning. We recently proposed the chunk-based DeepEmoCluster framework for <em>speech emotion recognition</em> (SER) to adopt the concept of deep clustering as a novel <em>semi-supervised learning</em> (SSL) framework, which achieved improved recognition performances over conventional reconstruction-based approaches. However, the vanilla DeepEmoCluster lacks critical sentence-level temporal information that is useful for SER tasks. This study builds upon the DeepEmoCluster framework, creating a powerful SSL approach that leverages temporal information within a sentence. We propose two sentence-level temporal modeling alternatives using either the <em>temporal-net</em> or the <em>triplet loss</em> function, resulting in a novel temporal-enhanced DeepEmoCluster framework to capture essential temporal information. The key contribution to achieving this goal is the proposed sentence-level uniform sampling strategy, which preserves the original temporal order of the data for the clustering process. An extra network module (e.g., gated recurrent unit) is utilized for the temporal-net option to encode temporal information across the data chunks. Alternatively, we can impose additional temporal constraints by using the triplet loss function while training the DeepEmoCluster framework, which does not increase model complexity. Our experimental results based on the MSP-Podcast corpus demonstrate that the proposed temporal-enhanced framework significantly outperforms the vanilla DeepEmoCluster framework and other existing SSL approaches in regression tasks for the emotional attributes arousal, dominance, and valence. The improvements are observed in fully-supervised learning or SSL implementations. Further analyses validate the effectiveness of the proposed temporal modeling, showing (1) high temporal consistency in the cluster assignment, and (2) well-separated <em>emotional patterns</em> in the generated clusters.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"157 ","pages":"Article 103027"},"PeriodicalIF":3.2,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167639323001619/pdfft?md5=8a58455c8fa8b02caee36f8fcfccf479&pid=1-s2.0-S0167639323001619-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139082603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LPIPS-AttnWav2Lip: Generic audio-driven lip synchronization for talking head generation in the wild LPIPS-AttnWav2Lip:通用音频驱动唇语同步,用于在野外生成对话头像
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2023-12-24 DOI: 10.1016/j.specom.2023.103028
Zhipeng Chen , Xinheng Wang , Lun Xie , Haijie Yuan , Hang Pan

Researchers have shown a growing interest in Audio-driven Talking Head Generation. The primary challenge in talking head generation is achieving audio-visual coherence between the lips and the audio, known as lip synchronization. This paper proposes a generic method, LPIPS-AttnWav2Lip, for reconstructing face images of any speaker based on audio. We used the U-Net architecture based on residual CBAM to better encode and fuse audio and visual modal information. Additionally, the semantic alignment module extends the receptive field of the generator network to obtain the spatial and channel information of the visual features efficiently; and match statistical information of visual features with audio latent vector to achieve the adjustment and injection of the audio content information to the visual information. To achieve exact lip synchronization and to generate realistic high-quality images, our approach adopts LPIPS Loss, which simulates human judgment of image quality and reduces instability possibility during the training process. The proposed method achieves outstanding performance in terms of lip synchronization accuracy and visual quality as demonstrated by subjective and objective evaluation results.

研究人员对音频驱动的 "说话头 "生成技术的兴趣与日俱增。话头生成的主要挑战是实现嘴唇与音频之间的视听一致性,即唇部同步。本文提出了一种通用方法 LPIPS-AttnWav2Lip,用于根据音频重建任何说话者的面部图像。我们使用基于残差 CBAM 的 U-Net 架构来更好地编码和融合音频和视觉模态信息。此外,语义对齐模块扩展了生成器网络的感受野,从而有效地获取视觉特征的空间和通道信息;并将视觉特征的统计信息与音频潜向量相匹配,实现音频内容信息对视觉信息的调整和注入。为了实现精确的唇语同步和生成逼真的高质量图像,我们的方法采用了 LPIPS Loss,模拟人类对图像质量的判断,减少了训练过程中不稳定的可能性。主观和客观的评估结果表明,所提出的方法在唇语同步准确性和视觉质量方面都有出色的表现。
{"title":"LPIPS-AttnWav2Lip: Generic audio-driven lip synchronization for talking head generation in the wild","authors":"Zhipeng Chen ,&nbsp;Xinheng Wang ,&nbsp;Lun Xie ,&nbsp;Haijie Yuan ,&nbsp;Hang Pan","doi":"10.1016/j.specom.2023.103028","DOIUrl":"10.1016/j.specom.2023.103028","url":null,"abstract":"<div><p>Researchers have shown a growing interest in Audio-driven Talking Head Generation. The primary challenge in talking head generation is achieving audio-visual coherence between the lips and the audio, known as lip synchronization. This paper proposes a generic method, LPIPS-AttnWav2Lip, for reconstructing face images of any speaker based on audio. We used the U-Net architecture based on residual CBAM to better encode and fuse audio and visual modal information. Additionally, the semantic alignment module extends the receptive field of the generator network<span> to obtain the spatial and channel information of the visual features efficiently; and match statistical information of visual features with audio latent vector to achieve the adjustment and injection of the audio content information to the visual information. To achieve exact lip synchronization and to generate realistic high-quality images, our approach adopts LPIPS Loss, which simulates human judgment of image quality and reduces instability possibility during the training process. The proposed method achieves outstanding performance in terms of lip synchronization accuracy and visual quality as demonstrated by subjective and objective evaluation results.</span></p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"157 ","pages":"Article 103028"},"PeriodicalIF":3.2,"publicationDate":"2023-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139027298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network 使用基于卷积注意力网络的听觉启发式掩蔽调制编码器进行鲁棒语音活动检测
IF 3.2 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2023-12-14 DOI: 10.1016/j.specom.2023.103024
Nan Li , Longbiao Wang , Meng Ge , Masashi Unoki , Sheng Li , Jianwu Dang

Deep learning has revolutionized voice activity detection (VAD) by offering promising solutions. However, directly applying traditional features, such as raw waveforms and Mel-frequency cepstral coefficients, to deep neural networks often leads to degraded VAD performance due to noise interference. In contrast, humans possess the remarkable ability to discern speech in complex and noisy environments, which motivated us to draw inspiration from the human auditory system. We propose a robust VAD algorithm called auditory-inspired masked modulation encoder based convolutional attention network (AMME-CANet) that integrates our AMME with CANet. Firstly, we investigate the design of auditory-inspired modulation features as a deep-learning encoder (AME), effectively simulating the process of sound-signal transmission to inner ear hair cells and subsequent modulation filtering by neural cells. Secondly, building upon the observed masking effects in the human auditory system, we enhance our auditory-inspired modulation encoder by incorporating a masking mechanism resulting in the AMME. The AMME amplifies cleaner speech frequencies while suppressing noise components. Thirdly, inspired by the human auditory mechanism and capitalizing on contextual information, we leverage the attention mechanism for VAD. This methodology uses an attention mechanism to assign higher weights to contextual information containing richer and more informative cues. Through extensive experimentation and evaluation, we demonstrated the superior performance of AMME-CANet in enhancing VAD under challenging noise conditions.

深度学习为语音活动检测(VAD)带来了革命性的变化,提供了前景广阔的解决方案。然而,将原始波形和梅尔频率共振频率系数等传统特征直接应用于深度神经网络,往往会因噪声干扰而导致 VAD 性能下降。相比之下,人类拥有在复杂和嘈杂环境中辨别语音的非凡能力,这促使我们从人类听觉系统中汲取灵感。我们提出了一种稳健的 VAD 算法,称为基于听觉启发的掩蔽调制编码器卷积注意网络(AMME-CANet),它将我们的 AMME 与 CANet 集成在一起。首先,我们研究了作为深度学习编码器(AME)的听觉启发调制特征的设计,有效地模拟了声音信号传输到内耳毛细胞以及神经细胞随后进行调制过滤的过程。其次,基于在人类听觉系统中观察到的掩蔽效应,我们通过在 AMME 中加入掩蔽机制来增强我们的听觉启发调制编码器。AMME 可放大较纯净的语音频率,同时抑制噪声成分。第三,受人类听觉机制的启发并利用上下文信息,我们利用注意力机制进行 VAD。这种方法利用注意力机制,为包含更丰富、更翔实线索的上下文信息分配更高的权重。通过广泛的实验和评估,我们证明了 AMME-CANet 在具有挑战性的噪声条件下增强 VAD 的卓越性能。
{"title":"Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network","authors":"Nan Li ,&nbsp;Longbiao Wang ,&nbsp;Meng Ge ,&nbsp;Masashi Unoki ,&nbsp;Sheng Li ,&nbsp;Jianwu Dang","doi":"10.1016/j.specom.2023.103024","DOIUrl":"10.1016/j.specom.2023.103024","url":null,"abstract":"<div><p><span><span>Deep learning<span> has revolutionized voice activity detection (VAD) by offering promising solutions. However, directly applying traditional features, such as raw waveforms and Mel-frequency </span></span>cepstral coefficients, to deep </span>neural networks<span><span> often leads to degraded VAD performance due to noise interference. In contrast, humans possess the remarkable ability to discern speech in complex and noisy environments, which motivated us to draw inspiration from the human auditory system. We propose a robust VAD algorithm called auditory-inspired masked modulation encoder based convolutional </span>attention network<span> (AMME-CANet) that integrates our AMME with CANet. Firstly, we investigate the design of auditory-inspired modulation features as a deep-learning encoder (AME), effectively simulating the process of sound-signal transmission to inner ear hair cells and subsequent modulation filtering by neural cells. Secondly, building upon the observed masking effects in the human auditory system, we enhance our auditory-inspired modulation encoder by incorporating a masking mechanism resulting in the AMME. The AMME amplifies cleaner speech frequencies while suppressing noise components. Thirdly, inspired by the human auditory mechanism and capitalizing on contextual information, we leverage the attention mechanism for VAD. This methodology uses an attention mechanism to assign higher weights to contextual information containing richer and more informative cues. Through extensive experimentation and evaluation, we demonstrated the superior performance of AMME-CANet in enhancing VAD under challenging noise conditions.</span></span></p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"157 ","pages":"Article 103024"},"PeriodicalIF":3.2,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138714391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Speech Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1