Y. Sanada, Takumi Nakagawa, Yuichiro Wada, K. Takanashi, Yuhui Zhang, Kiichi Tokuyama, T. Kanamori, Tomonori Yamada
{"title":"噪声语音去噪的深度自监督学习","authors":"Y. Sanada, Takumi Nakagawa, Yuichiro Wada, K. Takanashi, Yuhui Zhang, Kiichi Tokuyama, T. Kanamori, Tomonori Yamada","doi":"10.21437/interspeech.2022-306","DOIUrl":null,"url":null,"abstract":"In the last few years, unsupervised learning methods have been proposed in speech denoising by taking advantage of Deep Neural Networks (DNNs). The reason is that such unsupervised methods are more practical than the supervised counterparts. In our scenario, we are given a set of noisy speech data, where any two data do not share the same clean data. Our goal is to obtain the denoiser by training a DNN based model. Using the set, we train the model via the following two steps: 1) From the noisy speech data, construct another noisy speech data via our proposed masking technique. 2) Minimize our proposed loss defined from the DNN and the two noisy speech data. We evaluate our method using Gaussian and real-world noises in our numerical experiments. As a result, our method outperforms the state-of-the-art method on average for both noises. In addi-tion, we provide the theoretical explanation of why our method can be efficient if the noise has Gaussian distribution.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"1178-1182"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Deep Self-Supervised Learning of Speech Denoising from Noisy Speeches\",\"authors\":\"Y. Sanada, Takumi Nakagawa, Yuichiro Wada, K. Takanashi, Yuhui Zhang, Kiichi Tokuyama, T. Kanamori, Tomonori Yamada\",\"doi\":\"10.21437/interspeech.2022-306\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the last few years, unsupervised learning methods have been proposed in speech denoising by taking advantage of Deep Neural Networks (DNNs). The reason is that such unsupervised methods are more practical than the supervised counterparts. In our scenario, we are given a set of noisy speech data, where any two data do not share the same clean data. Our goal is to obtain the denoiser by training a DNN based model. Using the set, we train the model via the following two steps: 1) From the noisy speech data, construct another noisy speech data via our proposed masking technique. 2) Minimize our proposed loss defined from the DNN and the two noisy speech data. We evaluate our method using Gaussian and real-world noises in our numerical experiments. As a result, our method outperforms the state-of-the-art method on average for both noises. In addi-tion, we provide the theoretical explanation of why our method can be efficient if the noise has Gaussian distribution.\",\"PeriodicalId\":73500,\"journal\":{\"name\":\"Interspeech\",\"volume\":\"1 1\",\"pages\":\"1178-1182\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Interspeech\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/interspeech.2022-306\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-306","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep Self-Supervised Learning of Speech Denoising from Noisy Speeches
In the last few years, unsupervised learning methods have been proposed in speech denoising by taking advantage of Deep Neural Networks (DNNs). The reason is that such unsupervised methods are more practical than the supervised counterparts. In our scenario, we are given a set of noisy speech data, where any two data do not share the same clean data. Our goal is to obtain the denoiser by training a DNN based model. Using the set, we train the model via the following two steps: 1) From the noisy speech data, construct another noisy speech data via our proposed masking technique. 2) Minimize our proposed loss defined from the DNN and the two noisy speech data. We evaluate our method using Gaussian and real-world noises in our numerical experiments. As a result, our method outperforms the state-of-the-art method on average for both noises. In addi-tion, we provide the theoretical explanation of why our method can be efficient if the noise has Gaussian distribution.