{"title":"Swin-VEC:基于视频斯温变换器的 GAN,用于 VVC 的视频错误隐藏","authors":"Bing Zhang, Ran Ma, Yu Cao, Ping An","doi":"10.1007/s00371-024-03518-9","DOIUrl":null,"url":null,"abstract":"<p>Video error concealment can effectively improve the visual perception quality of videos damaged by packet loss in video transmission or error reception at the decoder. The latest versatile video coding (VVC) standard further improves the compression performance and lacks error recovery mechanism, which makes the VVC bitstream highly sensitive to errors. Most of the existing error concealment algorithms are designed for the video coding standards before VVC and are not applicable to VVC; thus, the research on video error concealment for VVC is urgently needed. In this paper, a novel deep video error concealment model for VVC is proposed, called Swin-VEC. The model innovatively integrates Video Swin Transformer into the generator of generative adversarial network (GAN). Specifically, the generator of the model employs convolutional neural network (CNN) to extract shallow features, and utilizes the Video Swin Transformer to extract deep multi-scale features. Subsequently, the designed dual upsampling modules are used to accomplish the recovery of spatiotemporal dimensions, and combined with CNN to achieve frame reconstruction. Moreover, an augmented dataset BVI-DVC-VVC is constructed for model training and verification. The optimization of the model is realized by adversarial training. Extensive experiments on BVI-DVC-VVC and UCF101 demonstrate the effectiveness and superiority of our proposed model for the video error concealment of VVC.\n</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Swin-VEC: Video Swin Transformer-based GAN for video error concealment of VVC\",\"authors\":\"Bing Zhang, Ran Ma, Yu Cao, Ping An\",\"doi\":\"10.1007/s00371-024-03518-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Video error concealment can effectively improve the visual perception quality of videos damaged by packet loss in video transmission or error reception at the decoder. The latest versatile video coding (VVC) standard further improves the compression performance and lacks error recovery mechanism, which makes the VVC bitstream highly sensitive to errors. Most of the existing error concealment algorithms are designed for the video coding standards before VVC and are not applicable to VVC; thus, the research on video error concealment for VVC is urgently needed. In this paper, a novel deep video error concealment model for VVC is proposed, called Swin-VEC. The model innovatively integrates Video Swin Transformer into the generator of generative adversarial network (GAN). Specifically, the generator of the model employs convolutional neural network (CNN) to extract shallow features, and utilizes the Video Swin Transformer to extract deep multi-scale features. Subsequently, the designed dual upsampling modules are used to accomplish the recovery of spatiotemporal dimensions, and combined with CNN to achieve frame reconstruction. Moreover, an augmented dataset BVI-DVC-VVC is constructed for model training and verification. The optimization of the model is realized by adversarial training. Extensive experiments on BVI-DVC-VVC and UCF101 demonstrate the effectiveness and superiority of our proposed model for the video error concealment of VVC.\\n</p>\",\"PeriodicalId\":501186,\"journal\":{\"name\":\"The Visual Computer\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Visual Computer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00371-024-03518-9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03518-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Swin-VEC: Video Swin Transformer-based GAN for video error concealment of VVC
Video error concealment can effectively improve the visual perception quality of videos damaged by packet loss in video transmission or error reception at the decoder. The latest versatile video coding (VVC) standard further improves the compression performance and lacks error recovery mechanism, which makes the VVC bitstream highly sensitive to errors. Most of the existing error concealment algorithms are designed for the video coding standards before VVC and are not applicable to VVC; thus, the research on video error concealment for VVC is urgently needed. In this paper, a novel deep video error concealment model for VVC is proposed, called Swin-VEC. The model innovatively integrates Video Swin Transformer into the generator of generative adversarial network (GAN). Specifically, the generator of the model employs convolutional neural network (CNN) to extract shallow features, and utilizes the Video Swin Transformer to extract deep multi-scale features. Subsequently, the designed dual upsampling modules are used to accomplish the recovery of spatiotemporal dimensions, and combined with CNN to achieve frame reconstruction. Moreover, an augmented dataset BVI-DVC-VVC is constructed for model training and verification. The optimization of the model is realized by adversarial training. Extensive experiments on BVI-DVC-VVC and UCF101 demonstrate the effectiveness and superiority of our proposed model for the video error concealment of VVC.