首页 > 最新文献

2012 8th International Symposium on Chinese Spoken Language Processing最新文献

英文 中文
Intra-conversation intra-speaker variability compensation for speaker clustering 说话人聚类的会话内说话人变异性补偿
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423465
Kui Wu, Yan Song, Wu Guo, Lirong Dai
Recently, the speaker clustering approach exploiting the intra-conversation variability in the total variability space has shown promising performance. However, there exists the variability in different segments of the same speaker within a conversation, termed as intra-conversation intra-speaker variability, which may scatter the distribution of the corresponding i-vector based representation of short speech segment, and degrades the clustering performance. To address this issue, we propose a new speaker clustering approach based on an extended total variability factor analysis. In our proposed method, the intra-conversation total variability space is divided into the inter-speaker and intra-speaker variability space. And by explicitly compensating the intra-conversation intra-speaker variability, the short speech segments would be represented more accurately. To evaluate the effectiveness of the proposed method, we conduct extensive experiments on NIST SRE 2008 summed channel telephone dataset. The experimental results show that the proposed method clearly outperforms the other state-of-the-art speaker clustering techniques in terms of clustering error rate.
近年来,利用总变异空间中会话内变异的说话人聚类方法显示出良好的性能。然而,在会话中,同一说话人的不同语音段存在可变性,称为会话内说话人内部可变性,这可能会分散相应的基于i向量的短语音段表示的分布,从而降低聚类性能。为了解决这个问题,我们提出了一种新的基于扩展的全变异因子分析的说话人聚类方法。在本文提出的方法中,将会话内总变异空间分为说话人间变异空间和说话人内部变异空间。通过显式补偿会话内说话者内部的变异,可以更准确地表示短语音片段。为了评估所提出方法的有效性,我们在NIST SRE 2008总结信道电话数据集上进行了大量实验。实验结果表明,该方法在聚类错误率方面明显优于其他先进的说话人聚类技术。
{"title":"Intra-conversation intra-speaker variability compensation for speaker clustering","authors":"Kui Wu, Yan Song, Wu Guo, Lirong Dai","doi":"10.1109/ISCSLP.2012.6423465","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423465","url":null,"abstract":"Recently, the speaker clustering approach exploiting the intra-conversation variability in the total variability space has shown promising performance. However, there exists the variability in different segments of the same speaker within a conversation, termed as intra-conversation intra-speaker variability, which may scatter the distribution of the corresponding i-vector based representation of short speech segment, and degrades the clustering performance. To address this issue, we propose a new speaker clustering approach based on an extended total variability factor analysis. In our proposed method, the intra-conversation total variability space is divided into the inter-speaker and intra-speaker variability space. And by explicitly compensating the intra-conversation intra-speaker variability, the short speech segments would be represented more accurately. To evaluate the effectiveness of the proposed method, we conduct extensive experiments on NIST SRE 2008 summed channel telephone dataset. The experimental results show that the proposed method clearly outperforms the other state-of-the-art speaker clustering techniques in terms of clustering error rate.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121038374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Exploring mutual information for GMM-based spectral conversion 探索基于gmm的光谱转换的互信息
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423477
Hsin-Te Hwang, Yu Tsao, H. Wang, Yih-Ru Wang, Sin-Horng Chen
In this paper, we propose a maximum mutual information (MMI) training criterion to refine the parameters of the joint density GMM (JDGMM) set to tackle the over-smoothing issue in voice conversion (VC). Conventionally, the maximum likelihood (ML) criterion is used to train a JDGMM set, which characterizes the joint property of the source and target feature vectors. The MMI training criterion, on the other hand, updates the parameters of the JDGMM set to increase its capability on modeling the dependency between the source and target feature vectors, and thus to make the converted sounds closer to the natural ones. The subjective listening test demonstrates that the quality and individuality of the converted speech by the proposed ML followed by MMI (ML+MMI) training method is better that by the ML training method.
在本文中,我们提出了一个最大互信息(MMI)训练准则来改进联合密度GMM (JDGMM)集的参数,以解决语音转换(VC)中的过度平滑问题。传统上,使用最大似然准则来训练JDGMM集,该集表征了源特征向量和目标特征向量的联合特性。另一方面,MMI训练准则更新了JDGMM集的参数,提高了JDGMM集对源特征向量和目标特征向量之间依赖关系的建模能力,从而使转换后的声音更接近自然声音。主观听力测试表明,本文提出的ML+MMI (ML+MMI)训练方法在转换语音的质量和个性上都优于ML训练方法。
{"title":"Exploring mutual information for GMM-based spectral conversion","authors":"Hsin-Te Hwang, Yu Tsao, H. Wang, Yih-Ru Wang, Sin-Horng Chen","doi":"10.1109/ISCSLP.2012.6423477","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423477","url":null,"abstract":"In this paper, we propose a maximum mutual information (MMI) training criterion to refine the parameters of the joint density GMM (JDGMM) set to tackle the over-smoothing issue in voice conversion (VC). Conventionally, the maximum likelihood (ML) criterion is used to train a JDGMM set, which characterizes the joint property of the source and target feature vectors. The MMI training criterion, on the other hand, updates the parameters of the JDGMM set to increase its capability on modeling the dependency between the source and target feature vectors, and thus to make the converted sounds closer to the natural ones. The subjective listening test demonstrates that the quality and individuality of the converted speech by the proposed ML followed by MMI (ML+MMI) training method is better that by the ML training method.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122603920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A feature-transform based approach to unsupervised task adaptation and personalization 基于特征变换的无监督任务自适应与个性化方法
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423513
Jian Xu, Zhijie Yan, Qiang Huo
This paper presents a feature-transform based approach to unsupervised task adaptation and personalization for speech recognition. Given task-specific speech data collected from a deployed service, an “acoustic sniffing” module is built first by using a so-called i-vector technique with a number of acoustic conditions identified via i-vector clustering. Unsupervised maximum likelihood training is then performed to estimate a task-dependent feature transform for each acoustic condition, while pre-trained HMM parameters of acoustic models are kept unchanged. Given an unknown utterance, an appropriate feature transform is selected via “acoustic sniffing”, which is used to transform the feature vectors of the unknown utterance for decoding. The effectiveness of the proposed method is confirmed in a task adaptation scenario from a conversational telephone speech transcription task to a short message dictation task. The same method is expected to work for personalization as well.
提出了一种基于特征变换的语音识别无监督任务自适应和个性化方法。给定从已部署服务中收集的特定任务语音数据,首先使用所谓的i向量技术构建“声学嗅探”模块,并通过i向量聚类识别许多声学条件。然后进行无监督最大似然训练来估计每个声学条件的任务相关特征变换,而声学模型的预训练HMM参数保持不变。给定未知话语,通过“声学嗅探”选择合适的特征变换,对未知话语的特征向量进行变换,进行解码。在从会话电话语音转录任务到短消息听写任务的任务适配场景中,验证了所提方法的有效性。同样的方法也适用于个性化。
{"title":"A feature-transform based approach to unsupervised task adaptation and personalization","authors":"Jian Xu, Zhijie Yan, Qiang Huo","doi":"10.1109/ISCSLP.2012.6423513","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423513","url":null,"abstract":"This paper presents a feature-transform based approach to unsupervised task adaptation and personalization for speech recognition. Given task-specific speech data collected from a deployed service, an “acoustic sniffing” module is built first by using a so-called i-vector technique with a number of acoustic conditions identified via i-vector clustering. Unsupervised maximum likelihood training is then performed to estimate a task-dependent feature transform for each acoustic condition, while pre-trained HMM parameters of acoustic models are kept unchanged. Given an unknown utterance, an appropriate feature transform is selected via “acoustic sniffing”, which is used to transform the feature vectors of the unknown utterance for decoding. The effectiveness of the proposed method is confirmed in a task adaptation scenario from a conversational telephone speech transcription task to a short message dictation task. The same method is expected to work for personalization as well.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127050952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A phone segmentation method and its evaluation on Mandarin speech corpus 一种基于普通话语音语料库的电话分割方法及其评价
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423515
Dac-Thang Hoang, Hsiao-Chuan Wang
This paper presents a phone segmentation method without a prior knowledge about the text contents. The proposed method is an unsupervised phone boundary detection based on band-energy tracing technique. It demonstrates a better performance than those previous works when the method was applied to TIMIT corpus. But the performance degrades when the method is applied to a Mandarin Chinese speech database, TCC300 corpus. The evaluation on this Mandarin speech corpus reveals some interesting facts that may cause the difficulty in detecting phone boundaries. We have proposed some ideas that may be helpful in future study for improving the phone segmentation method.
提出了一种不需要预先了解文本内容的电话分割方法。提出了一种基于能带能量跟踪技术的无监督手机边界检测方法。将该方法应用于TIMIT语料库,取得了较好的效果。但当该方法应用于汉语语音库TCC300语料库时,其性能有所下降。对这个普通话语音语料库的评价揭示了一些有趣的事实,这些事实可能会导致电话边界检测的困难。在此基础上,我们提出了一些改进电话分割方法的思路。
{"title":"A phone segmentation method and its evaluation on Mandarin speech corpus","authors":"Dac-Thang Hoang, Hsiao-Chuan Wang","doi":"10.1109/ISCSLP.2012.6423515","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423515","url":null,"abstract":"This paper presents a phone segmentation method without a prior knowledge about the text contents. The proposed method is an unsupervised phone boundary detection based on band-energy tracing technique. It demonstrates a better performance than those previous works when the method was applied to TIMIT corpus. But the performance degrades when the method is applied to a Mandarin Chinese speech database, TCC300 corpus. The evaluation on this Mandarin speech corpus reveals some interesting facts that may cause the difficulty in detecting phone boundaries. We have proposed some ideas that may be helpful in future study for improving the phone segmentation method.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128099363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cross-stream dependency modeling using continuous F0 model for HMM-based speech synthesis 基于连续F0模型的跨流依赖建模用于基于hmm的语音合成
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423457
Xin Wang, Zhenhua Ling, Lirong Dai
In our previous work, we have presented a cross-stream dependency modeling method for hidden Markov model (HMM) based parametric speech synthesis. In this method, multi-space probability distribution (MSD) was adopted for F0 modeling and the voicing decision error influenced the accuracy of generated spectral features severely. Therefore, a cross-stream dependency modeling method using continuous F0 HMM (CF-HMM) is proposed in this paper to circumvent voicing decision during the generation of spectral features. Besides, in order to prevent over-fitting problem in model training, regression class is introduced to tie the transform matrices in dependency models. Experiments on proposed methods show both improvement on the accuracy of the generated spectral features and effectiveness of introducing regression class in dependency model training.
在我们之前的工作中,我们提出了一种基于隐马尔可夫模型(HMM)的参数语音合成的跨流依赖建模方法。该方法采用多空间概率分布(MSD)进行F0建模,其发声决策误差严重影响了生成光谱特征的精度。为此,本文提出了一种使用连续F0 HMM (CF-HMM)的跨流依赖建模方法,以避免频谱特征生成过程中的语音决策。此外,为了防止模型训练中的过拟合问题,引入回归类将依赖模型中的变换矩阵联系起来。实验结果表明,所提方法不仅提高了谱特征生成的准确性,而且在依赖模型训练中引入回归类是有效的。
{"title":"Cross-stream dependency modeling using continuous F0 model for HMM-based speech synthesis","authors":"Xin Wang, Zhenhua Ling, Lirong Dai","doi":"10.1109/ISCSLP.2012.6423457","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423457","url":null,"abstract":"In our previous work, we have presented a cross-stream dependency modeling method for hidden Markov model (HMM) based parametric speech synthesis. In this method, multi-space probability distribution (MSD) was adopted for F0 modeling and the voicing decision error influenced the accuracy of generated spectral features severely. Therefore, a cross-stream dependency modeling method using continuous F0 HMM (CF-HMM) is proposed in this paper to circumvent voicing decision during the generation of spectral features. Besides, in order to prevent over-fitting problem in model training, regression class is introduced to tie the transform matrices in dependency models. Experiments on proposed methods show both improvement on the accuracy of the generated spectral features and effectiveness of introducing regression class in dependency model training.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128985279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Tongue shape synthesis based on Active Shape Model 基于主动形状模型的舌形综合
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423537
Chan Song, Jianguo Wei, Qiang Fang, Shen Liu, Yuguang Wang, J. Dang
Nowadays magnetic resonance imaging (MRI) technique has been widely used in speech production research since it acquires high spatial resolution data of vocal tract shape without any known harm of radiation. However, it would be time consuming and expensive to establish an overall articulatory database using MRI technique due to its low temporal resolution and the large expense of the MRI equipment. In this study, we propose a method to interpolate tongue shapes between static vowels to acquire dynamic tongue shapes. Firstly, a set of parameters is extracted to control tongue shape based on Active Shape Model (ASM). Then, control parameters are interpolated to synthesize dynamic tongue shapes from static vowels' articulation. To evaluate the method, a set of key points were chosen from both the MRI images and the synthesize tongue shapes. Results suggested that the dynamic properties of these key points from the synthesized tongue shapes resemble those of the actual dynamic tongue shapes.
目前,磁共振成像技术在语音生成研究中得到了广泛的应用,因为它可以获得高空间分辨率的声道形状数据,而且没有已知的辐射危害。然而,由于MRI技术的时间分辨率较低,且MRI设备价格昂贵,因此建立一个完整的关节数据库将是耗时且昂贵的。在本研究中,我们提出了一种在静态元音之间插入舌形以获得动态舌形的方法。首先,基于主动形状模型(ASM)提取一组参数来控制舌形;然后,通过插值控制参数,从静态元音的发音合成动态舌形。为了评估该方法,从MRI图像和合成舌形中选择了一组关键点。结果表明,合成舌形的这些关键点的动态特性与实际舌形相似。
{"title":"Tongue shape synthesis based on Active Shape Model","authors":"Chan Song, Jianguo Wei, Qiang Fang, Shen Liu, Yuguang Wang, J. Dang","doi":"10.1109/ISCSLP.2012.6423537","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423537","url":null,"abstract":"Nowadays magnetic resonance imaging (MRI) technique has been widely used in speech production research since it acquires high spatial resolution data of vocal tract shape without any known harm of radiation. However, it would be time consuming and expensive to establish an overall articulatory database using MRI technique due to its low temporal resolution and the large expense of the MRI equipment. In this study, we propose a method to interpolate tongue shapes between static vowels to acquire dynamic tongue shapes. Firstly, a set of parameters is extracted to control tongue shape based on Active Shape Model (ASM). Then, control parameters are interpolated to synthesize dynamic tongue shapes from static vowels' articulation. To evaluate the method, a set of key points were chosen from both the MRI images and the synthesize tongue shapes. Results suggested that the dynamic properties of these key points from the synthesized tongue shapes resemble those of the actual dynamic tongue shapes.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130907869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The coarticulation resistance of consonants in standard Chinese - An electropalatographic and acoustic study 普通话辅音的协同发音阻力——腭电图和声学研究
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423545
Yinghao Li, Jinghua Zhang, Jiangping Kong
The coarticulation resistance (CR) for 21 initial consonants in Standard Chinese was examined in CV monosyllables and symmetrical V1#C2V2 sequences (# stands for morpheme boundary) by analyzing the electropalatographic (EPG) and acoustic signals. The slope for F2 locus equation was compared with that for articulatory regression function, which was calculated by regressing the total linguopalatal contact ratio for vowel target frame against that for consonantal release/approach frame. The results show that the slopes derived from the articulatory regression functions for each consonant was the most appropriate measure to designate the consonant CR. When put together, the CR scale for Standard Chinese was represented by a continuum with an ascending order: labial <; velar <; alveolar <; dental, retroflex, and alveolo-palatal consonants. This consonant CR scale applied not only in the CV monosyllable set but also in V1#C2/#C2V2 transitions in V1#C2V2 sequences. The overall results of the paper support the DAC model in that the coarticulation resistance for consonants was closely dependent on the involvement of tongue dorsum gesture in segment production.
通过分析腭电图(EPG)和声学信号,研究了普通话21个声母在CV单音节和对称V1#C2V2序列(#代表语素边界)中的协同发音阻力(CR)。将F2位点方程的斜率与发音回归函数的斜率进行比较,发音回归函数是通过将元音目标框架的总舌接触比与辅音释放/接近框架的总舌接触比进行回归计算得出的。结果表明,由每个辅音的发音回归函数得到的斜率是表征辅音CR的最合适指标。综合来看,标准汉语的CR量表表现为一个连续体,从上到下依次为:唇<;膜的<;肺泡<;牙辅音、反折辅音和牙槽-腭辅音。这个辅音CR音阶不仅适用于CV单音节集,也适用于V1#C2/#C2V2序列中的V1#C2V2转换。本文的总体结果支持DAC模型,因为辅音的协发音阻力与舌背手势在音段产生中的参与密切相关。
{"title":"The coarticulation resistance of consonants in standard Chinese - An electropalatographic and acoustic study","authors":"Yinghao Li, Jinghua Zhang, Jiangping Kong","doi":"10.1109/ISCSLP.2012.6423545","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423545","url":null,"abstract":"The coarticulation resistance (CR) for 21 initial consonants in Standard Chinese was examined in CV monosyllables and symmetrical V1#C2V2 sequences (# stands for morpheme boundary) by analyzing the electropalatographic (EPG) and acoustic signals. The slope for F2 locus equation was compared with that for articulatory regression function, which was calculated by regressing the total linguopalatal contact ratio for vowel target frame against that for consonantal release/approach frame. The results show that the slopes derived from the articulatory regression functions for each consonant was the most appropriate measure to designate the consonant CR. When put together, the CR scale for Standard Chinese was represented by a continuum with an ascending order: labial <; velar <; alveolar <; dental, retroflex, and alveolo-palatal consonants. This consonant CR scale applied not only in the CV monosyllable set but also in V1#C2/#C2V2 transitions in V1#C2V2 sequences. The overall results of the paper support the DAC model in that the coarticulation resistance for consonants was closely dependent on the involvement of tongue dorsum gesture in segment production.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132766133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Acoustic and articulatory analysis on Japanese vowels in emotional speech 日语情感言语中元音的声学和发音分析
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423516
Mengxue Cao, Ai-jun Li, Qiang Fang, Jianguo Wei, Chan Song, J. Dang
Acoustic and Articulatory features of Japanese vowels were examined in “Neutral”, “Angry”, and “Sad” speech, using NDI Wave System. The results suggest that (1) Significant differences of the acoustic space, measured by F1 and F2, exist among different emotions. “Angry” is characterized by a horizontally compressed acoustic space, while “Sad” is characterized by a vertically compressed acoustic space. (2) The “front raising” and “retraction and back raising” patterns of the tongue movement mechanism can be enhanced by “Angry” and “Sad” emotion. (3) The lips' dynamically protruding feature is shared by both “Angry” and “Sad”, apart from the exception [A]. We suggested that the exception is caused by the increase of the mouth opening. The mouth opening and the degree of lip protrusion are a pair of complementary features. (4) In articulatory domain, “Angry” is characterized by an increase of mouth opening and a reducing of tongue horizontal movement range.
利用NDI波系统对日语“中性”、“愤怒”和“悲伤”三种语音中元音的声学和发音特征进行了研究。结果表明:(1)用F1和F2测量的声空间在不同情绪之间存在显著差异。《愤怒》的声学空间是水平压缩的,《悲伤》的声学空间是垂直压缩的。(2)“愤怒”和“悲伤”情绪可增强舌部运动机制的“前抬”和“后收抬”模式。(3)除了例外,“Angry”和“Sad”的嘴唇都具有动态突出的特征[A]。我们建议这个异常是由于开口增加造成的。嘴的张开和唇的突出程度是一对互补的特征。(4)在发音领域,“愤怒”表现为开口增大,舌头水平运动范围减小。
{"title":"Acoustic and articulatory analysis on Japanese vowels in emotional speech","authors":"Mengxue Cao, Ai-jun Li, Qiang Fang, Jianguo Wei, Chan Song, J. Dang","doi":"10.1109/ISCSLP.2012.6423516","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423516","url":null,"abstract":"Acoustic and Articulatory features of Japanese vowels were examined in “Neutral”, “Angry”, and “Sad” speech, using NDI Wave System. The results suggest that (1) Significant differences of the acoustic space, measured by F1 and F2, exist among different emotions. “Angry” is characterized by a horizontally compressed acoustic space, while “Sad” is characterized by a vertically compressed acoustic space. (2) The “front raising” and “retraction and back raising” patterns of the tongue movement mechanism can be enhanced by “Angry” and “Sad” emotion. (3) The lips' dynamically protruding feature is shared by both “Angry” and “Sad”, apart from the exception [A]. We suggested that the exception is caused by the increase of the mouth opening. The mouth opening and the degree of lip protrusion are a pair of complementary features. (4) In articulatory domain, “Angry” is characterized by an increase of mouth opening and a reducing of tongue horizontal movement range.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131557506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Incorporating dynamic features into minimum generation error training for HMM-based speech synthesis 基于hmm的语音合成最小生成误差训练中的动态特征
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423486
Duy Khanh Ninh, M. Morise, Y. Yamashita
This paper describes new methods of minimum generation error (MGE) training in HMM-based speech synthesis by introducing the error component of dynamic features into the generation error function. We propose two methods for setting the weight associated with the additional error component. In fixed weighting approach, this weight is kept constant over the course of speech. In adaptive weighting approach, it is adjusted according to the degree of dynamic of speech segments. Objective evaluation shows that the newly derived MGE criterion with adaptive weighting method obtains comparable performance on static feature and better performance on delta feature compared to the baseline MGE criterion. Subjective evaluation exhibits an improvement in the quality of synthesized speech with the proposed technique. The newly derived criterion improves the capability of the HMMs in capturing dynamic properties of speech without increasing the computational complexity of training process compared to the baseline criterion.
本文通过在生成误差函数中引入动态特征的误差分量,描述了基于hmm的语音合成中最小生成误差训练的新方法。我们提出了两种方法来设置与附加误差分量相关联的权重。在固定权重法中,这个权重在整个语音过程中保持不变。在自适应加权方法中,根据词段的动态程度对权重进行调整。客观评价表明,与基线MGE准则相比,采用自适应加权方法的MGE准则在静态特征上具有相当的性能,在增量特征上具有更好的性能。主观评价表明,采用该技术后,合成语音的质量有所提高。与基线准则相比,该准则在不增加训练过程计算复杂度的前提下,提高了hmm捕获语音动态特性的能力。
{"title":"Incorporating dynamic features into minimum generation error training for HMM-based speech synthesis","authors":"Duy Khanh Ninh, M. Morise, Y. Yamashita","doi":"10.1109/ISCSLP.2012.6423486","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423486","url":null,"abstract":"This paper describes new methods of minimum generation error (MGE) training in HMM-based speech synthesis by introducing the error component of dynamic features into the generation error function. We propose two methods for setting the weight associated with the additional error component. In fixed weighting approach, this weight is kept constant over the course of speech. In adaptive weighting approach, it is adjusted according to the degree of dynamic of speech segments. Objective evaluation shows that the newly derived MGE criterion with adaptive weighting method obtains comparable performance on static feature and better performance on delta feature compared to the baseline MGE criterion. Subjective evaluation exhibits an improvement in the quality of synthesized speech with the proposed technique. The newly derived criterion improves the capability of the HMMs in capturing dynamic properties of speech without increasing the computational complexity of training process compared to the baseline criterion.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130071430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effects of excitation spread on the intelligibility of Mandarin speech in cochlear implant simulations 人工耳蜗模拟中,激励扩散对普通话语音可理解性的影响
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423502
Fei Chen, Tian Guan, L. Wong
Noisy listening conditions remain challenging for most cochlear implant patients. The present study simulated the effects of decay rates of excitation spread in cochlear implants on the intelligibility of Mandarin speech in noise. Mandarin sentence and tone stimuli were processed by noise-vocoder, and presented to normal-hearing listeners for identification. The decay rates of excitation spread were simulated by varying the slopes of synthesis filters in noise-vocoder. Experimental results showed that significant benefit for Mandarin sentence recognition in noise was observed with narrower type of excitation. The performance of Mandarin tone identification was relatively robust to the influence of excitation spread. The results in the present study suggest that reducing the decay rates of excitation spread may potentially improve the speech perception in noise for cochlear implants in the future.
嘈杂的听力环境对大多数人工耳蜗患者来说仍然是一个挑战。本研究模拟了人工耳蜗内兴奋传播衰减率对噪声环境下汉语语音可理解性的影响。普通话句子和声调刺激经噪声-声码器处理后,呈现给听力正常的听者进行识别。通过改变噪声声码器中合成滤波器的斜率,模拟了激励扩散的衰减速率。实验结果表明,在噪声条件下,窄激励类型对汉语句子识别有显著的促进作用。普通话声调识别对激励扩散的影响具有较强的稳健性。本研究结果表明,降低激发传播的衰减率可能会在未来改善人工耳蜗在噪声中的语音感知。
{"title":"Effects of excitation spread on the intelligibility of Mandarin speech in cochlear implant simulations","authors":"Fei Chen, Tian Guan, L. Wong","doi":"10.1109/ISCSLP.2012.6423502","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423502","url":null,"abstract":"Noisy listening conditions remain challenging for most cochlear implant patients. The present study simulated the effects of decay rates of excitation spread in cochlear implants on the intelligibility of Mandarin speech in noise. Mandarin sentence and tone stimuli were processed by noise-vocoder, and presented to normal-hearing listeners for identification. The decay rates of excitation spread were simulated by varying the slopes of synthesis filters in noise-vocoder. Experimental results showed that significant benefit for Mandarin sentence recognition in noise was observed with narrower type of excitation. The performance of Mandarin tone identification was relatively robust to the influence of excitation spread. The results in the present study suggest that reducing the decay rates of excitation spread may potentially improve the speech perception in noise for cochlear implants in the future.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"173 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113996616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2012 8th International Symposium on Chinese Spoken Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1