Voice Conversion by Prosody and Vocal Tract Modification

K. S. Rao, B. Yegnanarayana
{"title":"Voice Conversion by Prosody and Vocal Tract Modification","authors":"K. S. Rao, B. Yegnanarayana","doi":"10.1109/ICIT.2006.92","DOIUrl":null,"url":null,"abstract":"In this paper we proposed some flexible methods, which are useful in the process of voice conversion. The proposed methods modify the shape of the vocal tract system and the characteristics of the prosody according to the desired requirement. The shape of the vocal tract system is modified by shifting the major resonant frequencies (formants) of the short term spectrum, and altering their band- widths accordingly. In the case of prosody modification, the required durational and intonational characteristics are imposed on the given speech signal. In the proposed method, the prosodic characteristics are manipulated using instants of significant excitation. The instants of significant excitation correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excitations like onset of burst in the case of nonvoiced speech. Instants of significant excitation are computed from the linear prediction (LP) residual of the speech signals by using the property of average group delay of minimum phase signals. The manipulations of durational characteristics and pitch contour (intonation pattern) are achieved by manipulating the LP residual with the help of the knowledge of the instants of significant excitation. The modified LP residual is used to excite the time varying filter. The filter parameters are updated according to the desired vocal tract characteristics. The proposed methods are evaluated using listening tests.","PeriodicalId":161120,"journal":{"name":"9th International Conference on Information Technology (ICIT'06)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"9th International Conference on Information Technology (ICIT'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIT.2006.92","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33

Abstract

In this paper we proposed some flexible methods, which are useful in the process of voice conversion. The proposed methods modify the shape of the vocal tract system and the characteristics of the prosody according to the desired requirement. The shape of the vocal tract system is modified by shifting the major resonant frequencies (formants) of the short term spectrum, and altering their band- widths accordingly. In the case of prosody modification, the required durational and intonational characteristics are imposed on the given speech signal. In the proposed method, the prosodic characteristics are manipulated using instants of significant excitation. The instants of significant excitation correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excitations like onset of burst in the case of nonvoiced speech. Instants of significant excitation are computed from the linear prediction (LP) residual of the speech signals by using the property of average group delay of minimum phase signals. The manipulations of durational characteristics and pitch contour (intonation pattern) are achieved by manipulating the LP residual with the help of the knowledge of the instants of significant excitation. The modified LP residual is used to excite the time varying filter. The filter parameters are updated according to the desired vocal tract characteristics. The proposed methods are evaluated using listening tests.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过韵律和声道修改来转换声音
在本文中,我们提出了一些灵活的方法,这些方法在语音转换过程中是有用的。所提出的方法根据期望的要求修改声道系统的形状和韵律的特征。声道系统的形状是通过改变短期频谱的主要共振频率(共振峰),并相应地改变其频带宽度来改变的。在韵律修饰的情况下,对给定的语音信号施加所需的时程和语调特征。在所提出的方法中,使用显著激励的瞬间来控制韵律特性。在发声的情况下,显著兴奋的时刻对应于声门关闭的时刻(epoch),而在非发声的情况下,则对应于一些随机的兴奋,如爆发的开始。利用最小相位信号的平均群延迟特性,从语音信号的线性预测残差中计算出显著激励时刻。持续时间特征和音高轮廓(音准模式)的操纵是通过在显著激励时刻的知识的帮助下操纵LP残差来实现的。利用改进的LP残差激励时变滤波器。根据期望的声道特征更新滤波器参数。采用听力测试对提出的方法进行评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Design of Novel Reversible Carry Look-Ahead BCD Subtractor Design of a Framework for Handling Security Issues in Grids A Programmable Parallel Structure to perform Galois Field Exponentiation Voice Conversion by Prosody and Vocal Tract Modification Use of Instance Typicality for Efficient Detection of Outliers with Neural Network Classifiers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1