首页 > 最新文献

Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020最新文献

英文 中文
The Tencent speech synthesis system for Blizzard Challenge 2020 2020暴雪挑战赛的腾讯语音合成系统
Pub Date : 2020-10-30 DOI: 10.21437/vcc_bc.2020-4
Qiao Tian, Zewang Zhang, Linghui Chen, Heng Lu, Chengzhu Yu, Chao Weng, Dong Yu
This paper presents the Tencent speech synthesis system for Blizzard Challenge 2020. The corpus released to the partici-pants this year included a TV’s news broadcasting corpus with a length around 8 hours by a Chinese male host (2020-MH1 task), and a Shanghaiese speech corpus with a length around 6 hours (2020-SS1 task). We built a DurIAN-based speech synthesis system for 2020-MH1 task and Tacotron-based system for 2020-SS1 task. For 2020-MH1 task, firstly, a multi-speaker DurIAN-based acoustic model was trained based on linguistic feature to predict mel spectrograms. Then the model was fine-tuned on only the corpus provided. For 2020-SS1 task, instead of training based on hard-aligned phone boundaries, a Tacotron-like end-to-end system is applied to learn the mappings between phonemes and mel spectrograms. Finally, a modified version of WaveRNN model conditioning on the predicted mel spectrograms is trained to generate speech waveform. Our team is identified as L and the evaluation results shows our systems perform very well in various tests. Especially, we took the first place in the overall speech intelligibility test.
本文介绍了腾讯暴雪挑战赛2020语音合成系统。今年向参与者发布的语料库包括一个长度约为8小时的中国男主持人的电视新闻广播语料库(2020-MH1任务)和一个长度约为6小时的上海话语料库(2020-SS1任务)。针对2020-MH1任务构建了基于durian的语音合成系统,针对2020-SS1任务构建了基于tacotron的语音合成系统。对于2020-MH1任务,首先基于语言特征训练基于durian的多说话人声学模型来预测mel谱图;然后仅根据提供的语料库对模型进行微调。对于2020-SS1任务,采用类似tacotron的端到端系统来学习音素和mel谱图之间的映射,而不是基于硬对齐的电话边界进行训练。最后,在预测的mel谱图上训练一个改进的WaveRNN模型来生成语音波形。我们的团队被确定为L,评估结果表明我们的系统在各种测试中表现非常好。特别是,我们在整体语音清晰度测试中获得了第一名。
{"title":"The Tencent speech synthesis system for Blizzard Challenge 2020","authors":"Qiao Tian, Zewang Zhang, Linghui Chen, Heng Lu, Chengzhu Yu, Chao Weng, Dong Yu","doi":"10.21437/vcc_bc.2020-4","DOIUrl":"https://doi.org/10.21437/vcc_bc.2020-4","url":null,"abstract":"This paper presents the Tencent speech synthesis system for Blizzard Challenge 2020. The corpus released to the partici-pants this year included a TV’s news broadcasting corpus with a length around 8 hours by a Chinese male host (2020-MH1 task), and a Shanghaiese speech corpus with a length around 6 hours (2020-SS1 task). We built a DurIAN-based speech synthesis system for 2020-MH1 task and Tacotron-based system for 2020-SS1 task. For 2020-MH1 task, firstly, a multi-speaker DurIAN-based acoustic model was trained based on linguistic feature to predict mel spectrograms. Then the model was fine-tuned on only the corpus provided. For 2020-SS1 task, instead of training based on hard-aligned phone boundaries, a Tacotron-like end-to-end system is applied to learn the mappings between phonemes and mel spectrograms. Finally, a modified version of WaveRNN model conditioning on the predicted mel spectrograms is trained to generate speech waveform. Our team is identified as L and the evaluation results shows our systems perform very well in various tests. Especially, we took the first place in the overall speech intelligibility test.","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126280216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The Ajmide Text-To-Speech System for Blizzard Challenge 2020 2020暴雪挑战赛的Ajmide文本转语音系统
Pub Date : 2020-10-30 DOI: 10.21437/vcc_bc.2020-13
Beibei Hu, Zilong Bai, Qiang Li
This paper presents the Ajmide team’s text-to-speech system for the task MH1 of Blizzard Challenge 2020. The task is to build a voice from about 9.5 hours of speech from a male native speaker of Mandarin. We built a speech synthesis system in an end-to-end style. The system consists of a BERT-based text front end that process both Chinese and English texts, a multi-speaker Tacotron2 model that converts the phoneme and linguistic feature sequence into mel spectrogram, and a modified WaveRNN vocoder that generate the audio waveform from the mel spectrogram. The listening evaluation results show that our system, identified by P, performs well in terms of naturalness, intelligibility and the aspects of intonation, emotion and listening effort.
本文介绍了Ajmide团队为暴雪挑战赛2020 MH1任务设计的文本转语音系统。这项任务是根据一个以普通话为母语的男性大约9.5小时的讲话来构建一个声音。我们建立了一个端到端的语音合成系统。该系统包括一个基于bert的文本前端,处理中英文文本,一个多扬声器Tacotron2模型,将音素和语言特征序列转换为mel谱图,以及一个改进的WaveRNN声码器,从mel谱图生成音频波形。听力评价结果表明,我们的系统在自然度、可理解性、语调、情感和听力努力等方面表现良好。
{"title":"The Ajmide Text-To-Speech System for Blizzard Challenge 2020","authors":"Beibei Hu, Zilong Bai, Qiang Li","doi":"10.21437/vcc_bc.2020-13","DOIUrl":"https://doi.org/10.21437/vcc_bc.2020-13","url":null,"abstract":"This paper presents the Ajmide team’s text-to-speech system for the task MH1 of Blizzard Challenge 2020. The task is to build a voice from about 9.5 hours of speech from a male native speaker of Mandarin. We built a speech synthesis system in an end-to-end style. The system consists of a BERT-based text front end that process both Chinese and English texts, a multi-speaker Tacotron2 model that converts the phoneme and linguistic feature sequence into mel spectrogram, and a modified WaveRNN vocoder that generate the audio waveform from the mel spectrogram. The listening evaluation results show that our system, identified by P, performs well in terms of naturalness, intelligibility and the aspects of intonation, emotion and listening effort.","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114927064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-Parallel Voice Conversion with Autoregressive Conversion Model and Duration Adjustment 基于自回归转换模型和时值调整的非并行语音转换
Pub Date : 2020-10-30 DOI: 10.21437/vcc_bc.2020-17
Li-Juan Liu, Yan-Nian Chen, Jing-Xuan Zhang, Yuan Jiang, Ya-Jun Hu, Zhenhua Ling, Lirong Dai
Although N10 system in Voice Conversion Challenge 2018 (VCC 18) has achieved excellent voice conversion results in both speech naturalness and speaker similarity, the sys-tem’s performance is limited due to some modeling insuffi-ciency. In this paper, we propose to overcome these limita-tions by introducing three modifications. First, we substitute an autoregressive-based model in order to improve the conversion model capability; second, we use high-fidelity WaveNet to model 24kHz/16bit waveform in order to improve conversion speech naturalness; third, a duration adjustment strategy is proposed to compensate the obvious speech rate difference between source and target speakers. Experimental results show that our proposed method can improve the conversion performance significantly. Furthermore, we validate the performance of this system for cross-lingual voice conversion by applying it directly to the cross-lingual task in Voice Conversion Challenge 2020 (VCC 2020). The released official subjective results show that our system obtains the best performance in conversion speech naturalness and comparable performance to the best system in speaker similarity, which indicate that our proposed method can achieve state-of-the-art cross-lingual voice conversion performance as well.
尽管N10系统在2018年语音转换挑战赛(VCC 18)中在语音自然度和说话人相似度方面都取得了出色的语音转换效果,但由于一些建模不足,系统的性能受到限制。在本文中,我们建议通过引入三个修改来克服这些限制。首先,为了提高模型转换能力,我们用自回归模型代替模型;其次,采用高保真WaveNet对24kHz/16bit波形进行建模,提高转换语音的自然度;第三,提出了一种时长调整策略来补偿源语和目标语明显的语速差异。实验结果表明,该方法能显著提高转换性能。此外,我们通过将该系统直接应用于语音转换挑战2020 (VCC 2020)中的跨语言任务,验证了该系统在跨语言语音转换方面的性能。官方发布的主观测试结果表明,我们的系统在转换语音的自然度方面取得了最好的性能,并且在说话人相似度方面取得了与最佳系统相当的性能,这表明我们的方法也可以实现最先进的跨语言语音转换性能。
{"title":"Non-Parallel Voice Conversion with Autoregressive Conversion Model and Duration Adjustment","authors":"Li-Juan Liu, Yan-Nian Chen, Jing-Xuan Zhang, Yuan Jiang, Ya-Jun Hu, Zhenhua Ling, Lirong Dai","doi":"10.21437/vcc_bc.2020-17","DOIUrl":"https://doi.org/10.21437/vcc_bc.2020-17","url":null,"abstract":"Although N10 system in Voice Conversion Challenge 2018 (VCC 18) has achieved excellent voice conversion results in both speech naturalness and speaker similarity, the sys-tem’s performance is limited due to some modeling insuffi-ciency. In this paper, we propose to overcome these limita-tions by introducing three modifications. First, we substitute an autoregressive-based model in order to improve the conversion model capability; second, we use high-fidelity WaveNet to model 24kHz/16bit waveform in order to improve conversion speech naturalness; third, a duration adjustment strategy is proposed to compensate the obvious speech rate difference between source and target speakers. Experimental results show that our proposed method can improve the conversion performance significantly. Furthermore, we validate the performance of this system for cross-lingual voice conversion by applying it directly to the cross-lingual task in Voice Conversion Challenge 2020 (VCC 2020). The released official subjective results show that our system obtains the best performance in conversion speech naturalness and comparable performance to the best system in speaker similarity, which indicate that our proposed method can achieve state-of-the-art cross-lingual voice conversion performance as well.","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127870314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
The RoyalFlush Synthesis System for Blizzard Challenge 2020 暴雪挑战赛2020的RoyalFlush合成系统
Pub Date : 2020-10-30 DOI: 10.21437/vcc_bc.2020-9
Jian Lu, Zeru Lu, Ting-ting He, Peng Zhang, Xinhui Hu, Xinkang Xu
The paper presents the RoyalFlush synthesis system for Blizzard Challenge 2020. Two required voices are built from the released Mandarin and Shanghainese data. Based on end-to-end speech synthesis technology, some improvements are introduced to the system compared with our system of last year. Firstly, a Mandarin front-end transforming input text into phoneme sequence along with prosody labels is employed. Then, to improve speech stability, a modified Tacotron acoustic model is proposed. Moreover, we apply GMM-based attention mechanism for robust long-form speech synthesis. Finally, a lightweight LPCNet-based neural vocoder is adopted to achieve a nice traceoff between effectiveness and efficiency. Among all the participating teams of the Challenge, the i-dentifier for our system is N. Evaluation results demonstrates that our system performs relatively well in intelligibility. But it still needs to be improved in terms of naturalness and similarity.
本文介绍了暴雪挑战赛2020的RoyalFlush合成系统。两种必要的声音是根据发布的普通话和上海话数据构建的。基于端到端语音合成技术,对系统进行了改进。首先,使用普通话前端将输入文本转换为音素序列并附带韵律标签。然后,为了提高语音稳定性,提出了一种改进的Tacotron声学模型。此外,我们将基于gmm的注意机制应用于鲁棒长篇语音合成。最后,采用了一种轻量级的lpcnet神经声码器,在效果和效率之间实现了良好的跟踪。在本次挑战赛的所有参赛队伍中,我们的系统的i-dentifier是n。评估结果表明,我们的系统在可理解性方面表现得比较好。但在自然度和相似度方面还有待提高。
{"title":"The RoyalFlush Synthesis System for Blizzard Challenge 2020","authors":"Jian Lu, Zeru Lu, Ting-ting He, Peng Zhang, Xinhui Hu, Xinkang Xu","doi":"10.21437/vcc_bc.2020-9","DOIUrl":"https://doi.org/10.21437/vcc_bc.2020-9","url":null,"abstract":"The paper presents the RoyalFlush synthesis system for Blizzard Challenge 2020. Two required voices are built from the released Mandarin and Shanghainese data. Based on end-to-end speech synthesis technology, some improvements are introduced to the system compared with our system of last year. Firstly, a Mandarin front-end transforming input text into phoneme sequence along with prosody labels is employed. Then, to improve speech stability, a modified Tacotron acoustic model is proposed. Moreover, we apply GMM-based attention mechanism for robust long-form speech synthesis. Finally, a lightweight LPCNet-based neural vocoder is adopted to achieve a nice traceoff between effectiveness and efficiency. Among all the participating teams of the Challenge, the i-dentifier for our system is N. Evaluation results demonstrates that our system performs relatively well in intelligibility. But it still needs to be improved in terms of naturalness and similarity.","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125440685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The OPPO System for the Blizzard Challenge 2020 2020暴雪挑战赛的OPPO系统
Pub Date : 2020-10-30 DOI: 10.21437/vcc_bc.2020-3
Yang Song, Min-Siong Liang, Guilin Yang, Kun Xie, Jie Hao
This paper presents the OPPO text-to-speech system for Blizzard Challenge 2020. A statistical parametric speech synthesis based system was built with improvements in both frontend and backend. For the Mandarin task, a BERT model was used for the frontend, a Tacotron acoustic model and a WaveRNN vocoder model were used for the backend. For the Shanghainese task, the frontend was built from scratch, a Tacotron acoustic model and a MelGAN vocoder model were used for the backend. For the Mandarin task, evaluation results showed that our proposed system performed best in naturalness, and achieved near-best results in similarity. For the Shanghainese task, we got poor results in most indicators.
本文介绍了暴雪挑战赛2020的OPPO文本转语音系统。基于统计参数的语音合成系统在前端和后端进行了改进。对于普通话任务,前端使用BERT模型,后端使用Tacotron声学模型和WaveRNN声码器模型。对于上海话任务,前端是从头开始构建的,后端使用了Tacotron声学模型和MelGAN声码器模型。对于普通话任务,评估结果表明,我们提出的系统在自然度方面表现最好,在相似度方面取得了接近最佳的结果。对于上海人任务,我们在大多数指标上都取得了较差的成绩。
{"title":"The OPPO System for the Blizzard Challenge 2020","authors":"Yang Song, Min-Siong Liang, Guilin Yang, Kun Xie, Jie Hao","doi":"10.21437/vcc_bc.2020-3","DOIUrl":"https://doi.org/10.21437/vcc_bc.2020-3","url":null,"abstract":"This paper presents the OPPO text-to-speech system for Blizzard Challenge 2020. A statistical parametric speech synthesis based system was built with improvements in both frontend and backend. For the Mandarin task, a BERT model was used for the frontend, a Tacotron acoustic model and a WaveRNN vocoder model were used for the backend. For the Shanghainese task, the frontend was built from scratch, a Tacotron acoustic model and a MelGAN vocoder model were used for the backend. For the Mandarin task, evaluation results showed that our proposed system performed best in naturalness, and achieved near-best results in similarity. For the Shanghainese task, we got poor results in most indicators.","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"247 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122580324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Blizzard Challenge 2020 2020暴雪挑战赛
Pub Date : 2020-10-30 DOI: 10.21437/vcc_bc.2020-1
Xiao Zhou, Zhenhao Ling, Simon King
The Blizzard Challenge 2020 is the sixteenth annual Blizzard Challenge. The challenge this year includes a hub task of synthesizing Mandarin speech and a spoke task of synthesizing Shanghainese speech. The speech data of these two Chinese dialects as well as corresponding text transcriptions were provided. Sixteen and eight teams participated in the two tasks respectively. Listening tests were conducted online to evaluate the performance of synthetic speech.
2020暴雪挑战赛是第16届暴雪挑战赛。今年的挑战包括合成普通话语音的中心任务和合成上海话语音的语音任务。提供了这两种汉语方言的语音数据和相应的文本转录。分别有16支和8支队伍参加了这两项任务。在线进行听力测试以评估合成语音的性能。
{"title":"The Blizzard Challenge 2020","authors":"Xiao Zhou, Zhenhao Ling, Simon King","doi":"10.21437/vcc_bc.2020-1","DOIUrl":"https://doi.org/10.21437/vcc_bc.2020-1","url":null,"abstract":"The Blizzard Challenge 2020 is the sixteenth annual Blizzard Challenge. The challenge this year includes a hub task of synthesizing Mandarin speech and a spoke task of synthesizing Shanghainese speech. The speech data of these two Chinese dialects as well as corresponding text transcriptions were provided. Sixteen and eight teams participated in the two tasks respectively. Listening tests were conducted online to evaluate the performance of synthetic speech.","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132777834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
CASIA Voice Conversion System for the Voice Conversion Challenge 2020 2020话音转换挑战赛中航协话音转换系统
Pub Date : 2020-10-30 DOI: 10.21437/vcc_bc.2020-19
Lian Zheng, J. Tao, Zhengqi Wen, Rongxiu Zhong
This paper presents our CASIA (Chinese Academy of Sciences, Institute of Automation) voice conversion system for the Voice Conversation Challenge 2020 (VCC 2020). The CASIA voice conversion system can be separated into two modules: the conversion model and the vocoder. We first extract linguistic features from the source speech. Then, the conversion model takes these linguistic features as the inputs, aiming to predict the acoustic features of the target speaker. Finally, the vocoder utilizes these predicted features to generate the speech waveform of the target speaker. In our system, we utilize the CBHG conversion model and the LPCNet vocoder for speech generation. To better control the prosody of the converted speech, we utilize acoustic features of the source speech as additional inputs, including the pitch, voiced/unvoiced flag and band aperiodicity. Since the training data is limited in VCC 2020, we build our system by combining the initialization using a multi-speaker data and the adaptation using limited data of the target speaker. The results of VCC 2020 rank our CASIA system in the second place with an overall mean opinion score of 3.99 for speaker quality and 84% accuracy for speaker similarity.
本文介绍了我们的CASIA(中国科学院自动化研究所)语音转换系统,用于2020年语音会话挑战赛(VCC 2020)。CASIA语音转换系统分为两个模块:转换模型和声码器。我们首先从源语音中提取语言特征。然后,转换模型将这些语言特征作为输入,旨在预测目标说话人的声学特征。最后,声码器利用这些预测的特征来生成目标说话人的语音波形。在我们的系统中,我们使用CBHG转换模型和LPCNet声码器进行语音生成。为了更好地控制转换语音的韵律,我们利用源语音的声学特征作为额外的输入,包括音高、浊音/浊音标志和频带非周期性。由于训练数据在VCC 2020中是有限的,我们通过结合使用多说话人数据的初始化和使用有限的目标说话人数据的自适应来构建系统。VCC 2020的结果将我们的CASIA系统排在第二位,扬声器质量的总体平均评分为3.99分,扬声器相似度的准确率为84%。
{"title":"CASIA Voice Conversion System for the Voice Conversion Challenge 2020","authors":"Lian Zheng, J. Tao, Zhengqi Wen, Rongxiu Zhong","doi":"10.21437/vcc_bc.2020-19","DOIUrl":"https://doi.org/10.21437/vcc_bc.2020-19","url":null,"abstract":"This paper presents our CASIA (Chinese Academy of Sciences, Institute of Automation) voice conversion system for the Voice Conversation Challenge 2020 (VCC 2020). The CASIA voice conversion system can be separated into two modules: the conversion model and the vocoder. We first extract linguistic features from the source speech. Then, the conversion model takes these linguistic features as the inputs, aiming to predict the acoustic features of the target speaker. Finally, the vocoder utilizes these predicted features to generate the speech waveform of the target speaker. In our system, we utilize the CBHG conversion model and the LPCNet vocoder for speech generation. To better control the prosody of the converted speech, we utilize acoustic features of the source speech as additional inputs, including the pitch, voiced/unvoiced flag and band aperiodicity. Since the training data is limited in VCC 2020, we build our system by combining the initialization using a multi-speaker data and the adaptation using limited data of the target speaker. The results of VCC 2020 rank our CASIA system in the second place with an overall mean opinion score of 3.99 for speaker quality and 84% accuracy for speaker similarity.","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130348784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
The HITSZ TTS system for Blizzard challenge 2020 2020暴雪挑战赛的HITSZ TTS系统
Pub Date : 2020-10-30 DOI: 10.21437/vcc_bc.2020-11
Huhao Fu, Yiben Zhang, Kai Liu, Chao Liu
In this paper, we present the techniques that were used in HITSZ-TTS 1 entry in Blizzard Challenge 2020. The corpus released to the participants this year is about 10-hours speech recordings from a Chinese male speaker with mixed Mandarin and English speech. Based on the above situation, we build an end to end speech synthesis system for this task. It is divided into the following parts: (1) the front-end module to analyze the pronunciation and prosody of text; (2) The phoneme-converted tool; (3) The forward-attention based sequence-to-sequence acoustic model with jointly learning with prosody labels to predict 80-dimensional Mel-spectrogram; (4) The Parallel WaveGAN based neural vocoder to reconstruct waveforms. This is the first time for us to join the Blizzard Challenge, and the identifier for our system is G. The evaluation results of subjective listening tests show that the proposed system achieves unsatisfactory performance. The problems in the system are also discussed in this paper.
在本文中,我们介绍了在暴雪挑战赛2020中使用的HITSZ-TTS 1参赛技术。今年向参与者发布的语料库是一位中国男性演讲者用普通话和英语混合演讲的大约10个小时的演讲录音。基于上述情况,我们构建了一个端到端语音合成系统。它分为以下几个部分:(1)前端模块对文本的语音韵律进行分析;(2)音素转换工具;(3)基于前向注意的与韵律标签联合学习的序列对序列声学模型预测80维mel -谱图;(4)基于并行WaveGAN的神经声码器重构波形。这是我们第一次参加暴雪挑战赛,我们的系统的标识符是g。主观听力测试的评价结果表明,我们提出的系统取得了令人不满意的性能。本文还对系统中存在的问题进行了讨论。
{"title":"The HITSZ TTS system for Blizzard challenge 2020","authors":"Huhao Fu, Yiben Zhang, Kai Liu, Chao Liu","doi":"10.21437/vcc_bc.2020-11","DOIUrl":"https://doi.org/10.21437/vcc_bc.2020-11","url":null,"abstract":"In this paper, we present the techniques that were used in HITSZ-TTS 1 entry in Blizzard Challenge 2020. The corpus released to the participants this year is about 10-hours speech recordings from a Chinese male speaker with mixed Mandarin and English speech. Based on the above situation, we build an end to end speech synthesis system for this task. It is divided into the following parts: (1) the front-end module to analyze the pronunciation and prosody of text; (2) The phoneme-converted tool; (3) The forward-attention based sequence-to-sequence acoustic model with jointly learning with prosody labels to predict 80-dimensional Mel-spectrogram; (4) The Parallel WaveGAN based neural vocoder to reconstruct waveforms. This is the first time for us to join the Blizzard Challenge, and the identifier for our system is G. The evaluation results of subjective listening tests show that the proposed system achieves unsatisfactory performance. The problems in the system are also discussed in this paper.","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115608591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The NUS and NWPU system for Voice Conversion Challenge 2020 2020年语音转换挑战赛的NUS和NWPU系统
Pub Date : 2020-10-30 DOI: 10.21437/vcc_bc.2020-26
Xiaohai Tian, Zhichao Wang, Shan Yang, Xinyong Zhou, Hongqiang Du, Yi Zhou, Mingyang Zhang, Kun Zhou, Berrak Sisman, Lei Xie, Haizhou Li
{"title":"The NUS and NWPU system for Voice Conversion Challenge 2020","authors":"Xiaohai Tian, Zhichao Wang, Shan Yang, Xinyong Zhou, Hongqiang Du, Yi Zhou, Mingyang Zhang, Kun Zhou, Berrak Sisman, Lei Xie, Haizhou Li","doi":"10.21437/vcc_bc.2020-26","DOIUrl":"https://doi.org/10.21437/vcc_bc.2020-26","url":null,"abstract":"","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132572963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The SHNU System for Blizzard Challenge 2020 2020暴雪挑战赛的SHNU系统
Pub Date : 2020-10-30 DOI: 10.21437/vcc_bc.2020-2
L. He, Q. Shi, Lang Wu, Jianqing Sun, Renke He, Yanhua Long, Jiaen Liang
This paper introduces the SHNU (team I) speech synthesis system for Blizzard Challenge 2020. Speech data released this year includes two parts: a 9.5-hour Mandarin corpus from a male native speaker and a 3-hour Shanghainese corpus from a female native speaker. Based on these corpora, we built two neural network-based speech synthesis systems to synthesize speech for both tasks. The same system architecture was used for both the Mandarin and Shanghainese tasks. Specifically, our systems include a front-end module, a Tacotron-based spectrogram prediction network and a WaveNet-based neural vocoder. Firstly, a pre-built front-end module was used to generate character sequence and linguistic features from the training text. Then, we applied a Tacotron-based sequence-to-sequence model to generate mel-spectrogram from character sequence. Finally, a WaveNet-based neural vocoder was adopted to reconstruct audio waveform with the mel-spectrogram from Tacotron. Evaluation results demonstrated that our system achieved an extremely good performance on both tasks, which proved the effectiveness of our proposed system.
本文介绍了暴雪挑战赛2020的SHNU(一队)语音合成系统。今年发布的语音数据包括两部分:来自母语为男性的9.5小时普通话语料库和来自母语为女性的3小时上海话语料库。基于这些语料库,我们构建了两个基于神经网络的语音合成系统来合成这两个任务的语音。同样的系统架构被用于普通话和上海话任务。具体来说,我们的系统包括一个前端模块,一个基于tacotron的频谱图预测网络和一个基于wavenet的神经声码器。首先,使用预先构建的前端模块从训练文本中生成字符序列和语言特征;然后,应用基于tacotron的序列到序列模型,从字符序列生成mel谱图。最后,采用基于wavenet的神经声码器,利用Tacotron的梅尔谱图重构音频波形。评估结果表明,我们的系统在这两个任务上都取得了非常好的性能,证明了我们提出的系统的有效性。
{"title":"The SHNU System for Blizzard Challenge 2020","authors":"L. He, Q. Shi, Lang Wu, Jianqing Sun, Renke He, Yanhua Long, Jiaen Liang","doi":"10.21437/vcc_bc.2020-2","DOIUrl":"https://doi.org/10.21437/vcc_bc.2020-2","url":null,"abstract":"This paper introduces the SHNU (team I) speech synthesis system for Blizzard Challenge 2020. Speech data released this year includes two parts: a 9.5-hour Mandarin corpus from a male native speaker and a 3-hour Shanghainese corpus from a female native speaker. Based on these corpora, we built two neural network-based speech synthesis systems to synthesize speech for both tasks. The same system architecture was used for both the Mandarin and Shanghainese tasks. Specifically, our systems include a front-end module, a Tacotron-based spectrogram prediction network and a WaveNet-based neural vocoder. Firstly, a pre-built front-end module was used to generate character sequence and linguistic features from the training text. Then, we applied a Tacotron-based sequence-to-sequence model to generate mel-spectrogram from character sequence. Finally, a WaveNet-based neural vocoder was adopted to reconstruct audio waveform with the mel-spectrogram from Tacotron. Evaluation results demonstrated that our system achieved an extremely good performance on both tasks, which proved the effectiveness of our proposed system.","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"234 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116418857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1