{"title":"Generation of Artificial FO-contours of Emotional Speech with Generative Adversarial Networks","authors":"Shumpei Matsuoka, Yao Jiang, A. Sasou","doi":"10.1109/SSCI44817.2019.9002917","DOIUrl":null,"url":null,"abstract":"Fundamental frequency (F0) contours play a very important role in reflecting the emotion, identity, intension, and attitude of a speaker in samples of speech. In this paper, we adopted a generative adversarial network (GAN) to generate artificial F0 contours of emotional speech. The GAN faces some limitations, however, in that it frequently generates undesired data because of unstable training, and it can repeatedly generate very similar or the same data, which is known as mode collapse. This study constructed a GAN-based generative model for F0 contours that can stably generate more-various F0 contours that fit the statistical characteristics of the training data. We tested the classification rate of four kinds of emotions in the F0 contours generated from five kinds of generative models. We also evaluated the averaged local density of the generated F0 contours to represent the variety of the generated F0 contours. Preliminary experiments confirmed the validity and effectiveness of the proposed generative model.","PeriodicalId":6729,"journal":{"name":"2019 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"27 1","pages":"1030-1034"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI44817.2019.9002917","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Fundamental frequency (F0) contours play a very important role in reflecting the emotion, identity, intension, and attitude of a speaker in samples of speech. In this paper, we adopted a generative adversarial network (GAN) to generate artificial F0 contours of emotional speech. The GAN faces some limitations, however, in that it frequently generates undesired data because of unstable training, and it can repeatedly generate very similar or the same data, which is known as mode collapse. This study constructed a GAN-based generative model for F0 contours that can stably generate more-various F0 contours that fit the statistical characteristics of the training data. We tested the classification rate of four kinds of emotions in the F0 contours generated from five kinds of generative models. We also evaluated the averaged local density of the generated F0 contours to represent the variety of the generated F0 contours. Preliminary experiments confirmed the validity and effectiveness of the proposed generative model.