Improving Unsupervised Style Transfer in end-to-end Speech Synthesis with end-to-end Speech Recognition

Da-Rong Liu, Chi-Yu Yang, Szu-Lin Wu, Hung-yi Lee
{"title":"Improving Unsupervised Style Transfer in end-to-end Speech Synthesis with end-to-end Speech Recognition","authors":"Da-Rong Liu, Chi-Yu Yang, Szu-Lin Wu, Hung-yi Lee","doi":"10.1109/SLT.2018.8639672","DOIUrl":null,"url":null,"abstract":"End-to-end TTS model can directly take an utterance as reference, and generate speech from the text with prosody and speaker characteristics similar to the reference utterance. Ideally, the transcription of reference utterance does not need to match the text to be synthesized, so unsupervised style transfer can be achieved. However, in the previous model, because only the matched text and speech are used in training, given unmatched text and speech during testing would make the model synthesize blurry speech. In this paper, we propose to mitigate the problem by using the unmatched text and speech during training, and using the ASR accuracy of an end-to-end ASR model to guide the training procedure. The experimental results show that with the guidance of end-to-end ASR, both the ASR accuracy (objective evaluation) and the listener preference (subjective evaluation) of the speech generated by TTS model are improved. Moreover, we propose attention consistency loss as regularization, which is shown to accelerate the training.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2018.8639672","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

End-to-end TTS model can directly take an utterance as reference, and generate speech from the text with prosody and speaker characteristics similar to the reference utterance. Ideally, the transcription of reference utterance does not need to match the text to be synthesized, so unsupervised style transfer can be achieved. However, in the previous model, because only the matched text and speech are used in training, given unmatched text and speech during testing would make the model synthesize blurry speech. In this paper, we propose to mitigate the problem by using the unmatched text and speech during training, and using the ASR accuracy of an end-to-end ASR model to guide the training procedure. The experimental results show that with the guidance of end-to-end ASR, both the ASR accuracy (objective evaluation) and the listener preference (subjective evaluation) of the speech generated by TTS model are improved. Moreover, we propose attention consistency loss as regularization, which is shown to accelerate the training.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用端到端语音识别改进端到端语音合成中的无监督风格迁移
端到端TTS模型可以直接将一个话语作为参考,从文本中生成与参考话语具有相似韵律和说话人特征的语音。理想情况下,参考话语的转录不需要与文本匹配来合成,因此可以实现无监督风格迁移。但是在之前的模型中,由于训练时只使用匹配的文本和语音,所以在测试时给出不匹配的文本和语音会使模型合成模糊的语音。在本文中,我们提出在训练过程中使用不匹配的文本和语音,并使用端到端ASR模型的ASR准确性来指导训练过程,以缓解这一问题。实验结果表明,在端到端ASR的指导下,TTS模型生成的语音的ASR精度(客观评价)和听者偏好(主观评价)都得到了提高。此外,我们提出了将注意力一致性损失作为正则化的方法,该方法被证明可以加速训练。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Sequence Teacher-Student Training of Acoustic Models for Automatic Free Speaking Language Assessment Leveraging Sequence-to-Sequence Speech Synthesis for Enhancing Acoustic-to-Word Speech Recognition Dynamic Extension of ASR Lexicon Using Wikipedia Data Detection and Calibration of Whisper for Speaker Recognition Out-of-Domain Slot Value Detection for Spoken Dialogue Systems with Context Information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1