Qinglin Hong, Chia Dai, Hui-Chun Hsu, Zong-Tai Wu, J. Hung
{"title":"利用感知度量损失改进语音增强中的DEMUCS系统","authors":"Qinglin Hong, Chia Dai, Hui-Chun Hsu, Zong-Tai Wu, J. Hung","doi":"10.1109/ICASI55125.2022.9774487","DOIUrl":null,"url":null,"abstract":"This study aims to improve the source separation technique, DEMUCS, by revising the respective loss function. DEMUCS, developed by Facebook Team, is built on the Wave-U-Net and consists of convolutional layer encoding and decoding blocks with an LSTM layer in between. The applied loss function in DEMUCS contains wave-domain L1 distance and multi-scale short-time-Fourier-transform (STFT) loss.We present to revise the original loss by considering the perceptual metric scores, including perceptual speech quality (PESQ) and short-time objective intelligibility (STOI). The new loss function becomes a weighted sum of the original loss and the losses of STOI and PESQ, hoping to highlight the perceptual quality of the enhanced utterances.According to the preliminary experiments conducted on the VoiceBank-DEMUCS task, the DEMUCS network with the modified loss function provides the noise-corrupted utterances with superior objective perceptual metric scores (PESQ and STOI). These results indicate that the presented work benefits DEMUCS in speech enhancement performance.","PeriodicalId":190229,"journal":{"name":"2022 8th International Conference on Applied System Innovation (ICASI)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Leveraging the perceptual metric loss to improve the DEMUCS system in speech enhancement\",\"authors\":\"Qinglin Hong, Chia Dai, Hui-Chun Hsu, Zong-Tai Wu, J. Hung\",\"doi\":\"10.1109/ICASI55125.2022.9774487\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study aims to improve the source separation technique, DEMUCS, by revising the respective loss function. DEMUCS, developed by Facebook Team, is built on the Wave-U-Net and consists of convolutional layer encoding and decoding blocks with an LSTM layer in between. The applied loss function in DEMUCS contains wave-domain L1 distance and multi-scale short-time-Fourier-transform (STFT) loss.We present to revise the original loss by considering the perceptual metric scores, including perceptual speech quality (PESQ) and short-time objective intelligibility (STOI). The new loss function becomes a weighted sum of the original loss and the losses of STOI and PESQ, hoping to highlight the perceptual quality of the enhanced utterances.According to the preliminary experiments conducted on the VoiceBank-DEMUCS task, the DEMUCS network with the modified loss function provides the noise-corrupted utterances with superior objective perceptual metric scores (PESQ and STOI). These results indicate that the presented work benefits DEMUCS in speech enhancement performance.\",\"PeriodicalId\":190229,\"journal\":{\"name\":\"2022 8th International Conference on Applied System Innovation (ICASI)\",\"volume\":\"89 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 8th International Conference on Applied System Innovation (ICASI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASI55125.2022.9774487\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 8th International Conference on Applied System Innovation (ICASI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASI55125.2022.9774487","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Leveraging the perceptual metric loss to improve the DEMUCS system in speech enhancement
This study aims to improve the source separation technique, DEMUCS, by revising the respective loss function. DEMUCS, developed by Facebook Team, is built on the Wave-U-Net and consists of convolutional layer encoding and decoding blocks with an LSTM layer in between. The applied loss function in DEMUCS contains wave-domain L1 distance and multi-scale short-time-Fourier-transform (STFT) loss.We present to revise the original loss by considering the perceptual metric scores, including perceptual speech quality (PESQ) and short-time objective intelligibility (STOI). The new loss function becomes a weighted sum of the original loss and the losses of STOI and PESQ, hoping to highlight the perceptual quality of the enhanced utterances.According to the preliminary experiments conducted on the VoiceBank-DEMUCS task, the DEMUCS network with the modified loss function provides the noise-corrupted utterances with superior objective perceptual metric scores (PESQ and STOI). These results indicate that the presented work benefits DEMUCS in speech enhancement performance.