基于对抗网络的多语音模式转换

2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON) Pub Date : 2022-12-02 DOI:10.1109/UPCON56432.2022.9986477

Kumud Tripathi, Jatin Kumar

{"title":"基于对抗网络的多语音模式转换","authors":"Kumud Tripathi, Jatin Kumar","doi":"10.1109/UPCON56432.2022.9986477","DOIUrl":null,"url":null,"abstract":"The objective of Multiple Speech Mode Transformation (MSMT) is to transform speech from one form to another on the basis of their mode characteristics. In this work, we have explored three different modes of speech (conversation, extempore, and read modes) for their inter-conversion while preserving the speaker identity and the linguistic content. To accomplish this we used a variant of Star Generative Adversarial Network (StarGAN) named as StarGAN-VC. For training, our model does not require parallel occurrences of the sentences and with relatively lesser number of training example we were able to generate high quality transformed outputs. On conducting objective and subjective evaluations, it is deduced that the transformed speech mode outputs are highly comparable to the target speech mode.","PeriodicalId":185782,"journal":{"name":"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multiple Speech Mode Transformation using Adversarial Network\",\"authors\":\"Kumud Tripathi, Jatin Kumar\",\"doi\":\"10.1109/UPCON56432.2022.9986477\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of Multiple Speech Mode Transformation (MSMT) is to transform speech from one form to another on the basis of their mode characteristics. In this work, we have explored three different modes of speech (conversation, extempore, and read modes) for their inter-conversion while preserving the speaker identity and the linguistic content. To accomplish this we used a variant of Star Generative Adversarial Network (StarGAN) named as StarGAN-VC. For training, our model does not require parallel occurrences of the sentences and with relatively lesser number of training example we were able to generate high quality transformed outputs. On conducting objective and subjective evaluations, it is deduced that the transformed speech mode outputs are highly comparable to the target speech mode.\",\"PeriodicalId\":185782,\"journal\":{\"name\":\"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)\",\"volume\":\"171 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UPCON56432.2022.9986477\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UPCON56432.2022.9986477","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

多语音模式转换(MSMT)的目标是根据语音的模式特征将语音从一种形式转换为另一种形式。在这项工作中，我们探讨了三种不同的语言模式(对话、即兴和阅读模式)在保留说话者身份和语言内容的情况下的相互转换。为了实现这一点，我们使用了星生成对抗网络(StarGAN)的一个变体，名为StarGAN- vc。对于训练，我们的模型不需要句子的并行出现，并且使用相对较少的训练示例，我们能够生成高质量的转换输出。通过客观和主观评价，推导出转换后的语音模式输出与目标语音模式具有较高的可比性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multiple Speech Mode Transformation using Adversarial Network

The objective of Multiple Speech Mode Transformation (MSMT) is to transform speech from one form to another on the basis of their mode characteristics. In this work, we have explored three different modes of speech (conversation, extempore, and read modes) for their inter-conversion while preserving the speaker identity and the linguistic content. To accomplish this we used a variant of Star Generative Adversarial Network (StarGAN) named as StarGAN-VC. For training, our model does not require parallel occurrences of the sentences and with relatively lesser number of training example we were able to generate high quality transformed outputs. On conducting objective and subjective evaluations, it is deduced that the transformed speech mode outputs are highly comparable to the target speech mode.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)

自引率

0.00%

发文量