Emotional Speech Synthesis using Subspace Constraints in Prosody

Shinya Mori, T. Moriyama, S. Ozawa
{"title":"Emotional Speech Synthesis using Subspace Constraints in Prosody","authors":"Shinya Mori, T. Moriyama, S. Ozawa","doi":"10.1109/ICME.2006.262725","DOIUrl":null,"url":null,"abstract":"An efficient speech synthesis method that uses subspace constraint in prosody is proposed. Conventional unit selection methods concatenate speech segments stored in database, that require enormous number of waveforms in synthesizing various emotional expressions with arbitrary texts. The proposed method employs principal component analysis to reduce the dimensionality of prosodic components, that also allows us to generate new speech that are similar to training samples. The subspace constraint assures that the prosody of the synthesized speech including F0, power, and speech length hold their correlative relation that training samples of emotional speech have. We assume that the combination of the number of syllables and the accent type determines the correlative dynamics of prosody, for each of which we individually construct the subspace. The subspace is then linearly related to emotions by multiple regression analysis that are obtained by subjective evaluation for the training samples. Experimental results demonstrated that only 4 dimensions were sufficient for representing the prosodic changes due to emotion at over 90% of the total variance. Synthesized emotion were successfully recognized by the listeners of the synthesized speech, especially for \"anger\", \"surprise\", \"disgust\", 'sorrow\", \"boredom\", \"depression\", and \"joy\"","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE International Conference on Multimedia and Expo","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2006.262725","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

Abstract

An efficient speech synthesis method that uses subspace constraint in prosody is proposed. Conventional unit selection methods concatenate speech segments stored in database, that require enormous number of waveforms in synthesizing various emotional expressions with arbitrary texts. The proposed method employs principal component analysis to reduce the dimensionality of prosodic components, that also allows us to generate new speech that are similar to training samples. The subspace constraint assures that the prosody of the synthesized speech including F0, power, and speech length hold their correlative relation that training samples of emotional speech have. We assume that the combination of the number of syllables and the accent type determines the correlative dynamics of prosody, for each of which we individually construct the subspace. The subspace is then linearly related to emotions by multiple regression analysis that are obtained by subjective evaluation for the training samples. Experimental results demonstrated that only 4 dimensions were sufficient for representing the prosodic changes due to emotion at over 90% of the total variance. Synthesized emotion were successfully recognized by the listeners of the synthesized speech, especially for "anger", "surprise", "disgust", 'sorrow", "boredom", "depression", and "joy"
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于韵律子空间约束的情感语音合成
提出了一种利用韵律子空间约束的高效语音合成方法。传统的单元选择方法是将数据库中存储的语音片段拼接在一起,这需要大量的波形来合成任意文本的各种情感表达。该方法采用主成分分析来降低韵律成分的维数,从而生成与训练样本相似的新语音。子空间约束保证了合成语音的韵律(包括F0、功率和语音长度)保持情感语音训练样本所具有的相关关系。我们假设音节数和重音类型的组合决定了韵律的相关动态,我们分别为每一个音节和重音类型构建子空间。然后,通过对训练样本的主观评价获得的多元回归分析,子空间与情绪线性相关。实验结果表明,在总方差的90%以上,仅4个维度就足以表征情绪引起的韵律变化。合成的情绪被听众成功地识别出来,特别是“愤怒”、“惊讶”、“厌恶”、“悲伤”、“无聊”、“沮丧”和“喜悦”。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Acoustic Echo Cancellation in a Channel with Rapidly Varying Gain A Two-Layer Graphical Model for Combined Video Shot and Scene Boundary Detection SCCS: A Scalable Clustered Camera System for Multiple Object Tracking Communicating Via Message Passing Interface Identification and Detection of the Same Scene Based on Flash Light Patterns Bandwidth Estimation in Wireless Lans for Multimedia Streaming Services
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1