{"title":"Sound-to-Sound Translation Using Generative Adversarial Network and Sound U-Net","authors":"Yugo Kunisada, C. Premachandra","doi":"10.1109/ICIPRob54042.2022.9798737","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a generic learning method for training conditional generative adversarial networks on audio data. This makes it possible to apply the same generic approach as described in this study to problems that previously required completely different loss formulations when learning audio data. This method can be useful for labeling noises with a certain number of identical frequencies, generating speech labels corresponding to each frequency, and generating audio data for noise cancellation. To achieve this, we propose a sound restoration process based on U-Net, called Sound U-net. In this study, we realized a wide applicability of our system, owing to its ease of implementation without a parameter adjustment, as well as a reduction in the training time for audio data. During the experiment, reasonable results were obtained without manually adjusting the loss function.","PeriodicalId":435575,"journal":{"name":"2022 2nd International Conference on Image Processing and Robotics (ICIPRob)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Image Processing and Robotics (ICIPRob)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIPRob54042.2022.9798737","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we propose a generic learning method for training conditional generative adversarial networks on audio data. This makes it possible to apply the same generic approach as described in this study to problems that previously required completely different loss formulations when learning audio data. This method can be useful for labeling noises with a certain number of identical frequencies, generating speech labels corresponding to each frequency, and generating audio data for noise cancellation. To achieve this, we propose a sound restoration process based on U-Net, called Sound U-net. In this study, we realized a wide applicability of our system, owing to its ease of implementation without a parameter adjustment, as well as a reduction in the training time for audio data. During the experiment, reasonable results were obtained without manually adjusting the loss function.