{"title":"基于对抗网络的多语音模式转换","authors":"Kumud Tripathi, Jatin Kumar","doi":"10.1109/UPCON56432.2022.9986477","DOIUrl":null,"url":null,"abstract":"The objective of Multiple Speech Mode Transformation (MSMT) is to transform speech from one form to another on the basis of their mode characteristics. In this work, we have explored three different modes of speech (conversation, extempore, and read modes) for their inter-conversion while preserving the speaker identity and the linguistic content. To accomplish this we used a variant of Star Generative Adversarial Network (StarGAN) named as StarGAN-VC. For training, our model does not require parallel occurrences of the sentences and with relatively lesser number of training example we were able to generate high quality transformed outputs. On conducting objective and subjective evaluations, it is deduced that the transformed speech mode outputs are highly comparable to the target speech mode.","PeriodicalId":185782,"journal":{"name":"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multiple Speech Mode Transformation using Adversarial Network\",\"authors\":\"Kumud Tripathi, Jatin Kumar\",\"doi\":\"10.1109/UPCON56432.2022.9986477\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of Multiple Speech Mode Transformation (MSMT) is to transform speech from one form to another on the basis of their mode characteristics. In this work, we have explored three different modes of speech (conversation, extempore, and read modes) for their inter-conversion while preserving the speaker identity and the linguistic content. To accomplish this we used a variant of Star Generative Adversarial Network (StarGAN) named as StarGAN-VC. For training, our model does not require parallel occurrences of the sentences and with relatively lesser number of training example we were able to generate high quality transformed outputs. On conducting objective and subjective evaluations, it is deduced that the transformed speech mode outputs are highly comparable to the target speech mode.\",\"PeriodicalId\":185782,\"journal\":{\"name\":\"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)\",\"volume\":\"171 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UPCON56432.2022.9986477\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UPCON56432.2022.9986477","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multiple Speech Mode Transformation using Adversarial Network
The objective of Multiple Speech Mode Transformation (MSMT) is to transform speech from one form to another on the basis of their mode characteristics. In this work, we have explored three different modes of speech (conversation, extempore, and read modes) for their inter-conversion while preserving the speaker identity and the linguistic content. To accomplish this we used a variant of Star Generative Adversarial Network (StarGAN) named as StarGAN-VC. For training, our model does not require parallel occurrences of the sentences and with relatively lesser number of training example we were able to generate high quality transformed outputs. On conducting objective and subjective evaluations, it is deduced that the transformed speech mode outputs are highly comparable to the target speech mode.