Enhancing Prosodic Features by Adopting Pre-trained Language Model in Bahasa Indonesia Speech Synthesis

Lixuan Zhao, Jian Yang, Qinglai Qin
{"title":"Enhancing Prosodic Features by Adopting Pre-trained Language Model in Bahasa Indonesia Speech Synthesis","authors":"Lixuan Zhao, Jian Yang, Qinglai Qin","doi":"10.1145/3446132.3446196","DOIUrl":null,"url":null,"abstract":"Deep neural network text-to-speech (TTS) systems can produce high-quality audio. However, modern TTS systems usually need a sizable of studio-quality pairs as input. In view of the insufficient research on Bahasa Indonesia, available data are usually worse in term of both quality and size. The End-to-End(E2E) TTS systems trained on those corpora are difficult to generate satisfactory speech, especially the prosodic features are not obvious. Therefore, we propose a method to enhance the prosodic features of synthesized speech based on GST-Tacotron2 model, and pre-trained language model with the BERT (Bidirectional Encoder Representation from Transformers) model. The BERT learned from large number of unlabeled text data contains rich linguistic information, which can help TTS systems produce the more obvious prosodic features. The subjective evaluation of our experimental results shows that the proposed method can indeed enhance the rhythm of synthesized speech.","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3446132.3446196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Deep neural network text-to-speech (TTS) systems can produce high-quality audio. However, modern TTS systems usually need a sizable of studio-quality pairs as input. In view of the insufficient research on Bahasa Indonesia, available data are usually worse in term of both quality and size. The End-to-End(E2E) TTS systems trained on those corpora are difficult to generate satisfactory speech, especially the prosodic features are not obvious. Therefore, we propose a method to enhance the prosodic features of synthesized speech based on GST-Tacotron2 model, and pre-trained language model with the BERT (Bidirectional Encoder Representation from Transformers) model. The BERT learned from large number of unlabeled text data contains rich linguistic information, which can help TTS systems produce the more obvious prosodic features. The subjective evaluation of our experimental results shows that the proposed method can indeed enhance the rhythm of synthesized speech.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
采用预训练语言模型增强印尼语语音合成中的韵律特征
深度神经网络文本到语音(TTS)系统可以产生高质量的音频。然而,现代TTS系统通常需要相当数量的工作室质量对作为输入。鉴于对印尼语的研究不足,现有的数据在质量和规模上通常都较差。在这些语料库上训练的端到端TTS系统很难产生令人满意的语音,尤其是韵律特征不明显。因此,我们提出了一种基于GST-Tacotron2模型和BERT (Bidirectional Encoder Representation from Transformers)模型的预训练语言模型来增强合成语音的韵律特征的方法。BERT从大量未标注的文本数据中学习到丰富的语言信息,可以帮助TTS系统产生更明显的韵律特征。对实验结果的主观评价表明,所提出的方法确实可以提高合成语音的节奏。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Lane Detection Combining Details and Integrity: an Advanced Method for Lane Detection The Cat's Eye Effect Target Recognition Method Based on deep convolutional neural network Leveraging Different Context for Response Generation through Topic-guided Multi-head Attention Siamese Multiplicative LSTM for Semantic Text Similarity Multi-constrained Vehicle Routing Problem Solution based on Adaptive Genetic Algorithm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1