Baiyun Liu, Qi Song, Mingxue Yang, Wuwen Yuan, Tianbao Wang
{"title":"基于半监督生成对抗网络的实时丢包隐藏","authors":"Baiyun Liu, Qi Song, Mingxue Yang, Wuwen Yuan, Tianbao Wang","doi":"10.21437/interspeech.2022-10428","DOIUrl":null,"url":null,"abstract":"Packet loss is one of the main reasons for speech quality degradation in voice over internet phone (VOIP) calls. However, the existing packet loss concealment (PLC) algorithms are hard to generate high-quality speech signal while maintaining low computational complexity. In this paper, a causal wave-to-wave non-autoregressive lightweight PLC model (PLCNet) is proposed, which can do real-time streaming process with low latency. In addition, we introduce multiple multi-resolution discriminators and semi-supervised training strategy to improve the ability of the encoder part to extract global features while enabling the decoder part to accurately reconstruct waveforms where packets are lost. Contrary to autoregressive model, PLCNet can guarantee the smoothness and continuity of the speech phase before and after packet loss without any smoothing operations. Experimental results show that PLCNet achieves significant improvements in perceptual quality and intelligibility over three classical PLC methods and three state-of-the-art deep PLC methods. In the INTERSPEECH 2022 PLC Challenge, our approach has ranked the 3rd place on PLCMOS (3.829) and the 3rd place on the final score (0.798).","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"575-579"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"PLCNet: Real-time Packet Loss Concealment with Semi-supervised Generative Adversarial Network\",\"authors\":\"Baiyun Liu, Qi Song, Mingxue Yang, Wuwen Yuan, Tianbao Wang\",\"doi\":\"10.21437/interspeech.2022-10428\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Packet loss is one of the main reasons for speech quality degradation in voice over internet phone (VOIP) calls. However, the existing packet loss concealment (PLC) algorithms are hard to generate high-quality speech signal while maintaining low computational complexity. In this paper, a causal wave-to-wave non-autoregressive lightweight PLC model (PLCNet) is proposed, which can do real-time streaming process with low latency. In addition, we introduce multiple multi-resolution discriminators and semi-supervised training strategy to improve the ability of the encoder part to extract global features while enabling the decoder part to accurately reconstruct waveforms where packets are lost. Contrary to autoregressive model, PLCNet can guarantee the smoothness and continuity of the speech phase before and after packet loss without any smoothing operations. Experimental results show that PLCNet achieves significant improvements in perceptual quality and intelligibility over three classical PLC methods and three state-of-the-art deep PLC methods. In the INTERSPEECH 2022 PLC Challenge, our approach has ranked the 3rd place on PLCMOS (3.829) and the 3rd place on the final score (0.798).\",\"PeriodicalId\":73500,\"journal\":{\"name\":\"Interspeech\",\"volume\":\"1 1\",\"pages\":\"575-579\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Interspeech\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/interspeech.2022-10428\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-10428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
PLCNet: Real-time Packet Loss Concealment with Semi-supervised Generative Adversarial Network
Packet loss is one of the main reasons for speech quality degradation in voice over internet phone (VOIP) calls. However, the existing packet loss concealment (PLC) algorithms are hard to generate high-quality speech signal while maintaining low computational complexity. In this paper, a causal wave-to-wave non-autoregressive lightweight PLC model (PLCNet) is proposed, which can do real-time streaming process with low latency. In addition, we introduce multiple multi-resolution discriminators and semi-supervised training strategy to improve the ability of the encoder part to extract global features while enabling the decoder part to accurately reconstruct waveforms where packets are lost. Contrary to autoregressive model, PLCNet can guarantee the smoothness and continuity of the speech phase before and after packet loss without any smoothing operations. Experimental results show that PLCNet achieves significant improvements in perceptual quality and intelligibility over three classical PLC methods and three state-of-the-art deep PLC methods. In the INTERSPEECH 2022 PLC Challenge, our approach has ranked the 3rd place on PLCMOS (3.829) and the 3rd place on the final score (0.798).