Swin-VEC: Video Swin Transformer-based GAN for video error concealment of VVC

The Visual Computer Pub Date : 2024-06-18 DOI:10.1007/s00371-024-03518-9

Bing Zhang, Ran Ma, Yu Cao, Ping An

{"title":"Swin-VEC: Video Swin Transformer-based GAN for video error concealment of VVC","authors":"Bing Zhang, Ran Ma, Yu Cao, Ping An","doi":"10.1007/s00371-024-03518-9","DOIUrl":null,"url":null,"abstract":"<p>Video error concealment can effectively improve the visual perception quality of videos damaged by packet loss in video transmission or error reception at the decoder. The latest versatile video coding (VVC) standard further improves the compression performance and lacks error recovery mechanism, which makes the VVC bitstream highly sensitive to errors. Most of the existing error concealment algorithms are designed for the video coding standards before VVC and are not applicable to VVC; thus, the research on video error concealment for VVC is urgently needed. In this paper, a novel deep video error concealment model for VVC is proposed, called Swin-VEC. The model innovatively integrates Video Swin Transformer into the generator of generative adversarial network (GAN). Specifically, the generator of the model employs convolutional neural network (CNN) to extract shallow features, and utilizes the Video Swin Transformer to extract deep multi-scale features. Subsequently, the designed dual upsampling modules are used to accomplish the recovery of spatiotemporal dimensions, and combined with CNN to achieve frame reconstruction. Moreover, an augmented dataset BVI-DVC-VVC is constructed for model training and verification. The optimization of the model is realized by adversarial training. Extensive experiments on BVI-DVC-VVC and UCF101 demonstrate the effectiveness and superiority of our proposed model for the video error concealment of VVC.\n</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03518-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Video error concealment can effectively improve the visual perception quality of videos damaged by packet loss in video transmission or error reception at the decoder. The latest versatile video coding (VVC) standard further improves the compression performance and lacks error recovery mechanism, which makes the VVC bitstream highly sensitive to errors. Most of the existing error concealment algorithms are designed for the video coding standards before VVC and are not applicable to VVC; thus, the research on video error concealment for VVC is urgently needed. In this paper, a novel deep video error concealment model for VVC is proposed, called Swin-VEC. The model innovatively integrates Video Swin Transformer into the generator of generative adversarial network (GAN). Specifically, the generator of the model employs convolutional neural network (CNN) to extract shallow features, and utilizes the Video Swin Transformer to extract deep multi-scale features. Subsequently, the designed dual upsampling modules are used to accomplish the recovery of spatiotemporal dimensions, and combined with CNN to achieve frame reconstruction. Moreover, an augmented dataset BVI-DVC-VVC is constructed for model training and verification. The optimization of the model is realized by adversarial training. Extensive experiments on BVI-DVC-VVC and UCF101 demonstrate the effectiveness and superiority of our proposed model for the video error concealment of VVC.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Swin-VEC：基于视频斯温变换器的 GAN，用于 VVC 的视频错误隐藏

视频错误隐藏可以有效改善因视频传输过程中丢包或解码器接收错误而受损的视频的视觉感知质量。最新的通用视频编码（VVC）标准进一步提高了压缩性能，但缺乏错误恢复机制，这使得 VVC 比特流对错误高度敏感。现有的错误隐藏算法大多是针对 VVC 之前的视频编码标准设计的，不适用于 VVC；因此，针对 VVC 的视频错误隐藏研究迫在眉睫。本文提出了一种新型的 VVC 深度视频错误隐藏模型，称为 Swin-VEC。该模型创新性地将视频 Swin 变换器集成到生成式对抗网络（GAN）的生成器中。具体来说，该模型的生成器采用卷积神经网络（CNN）提取浅层特征，并利用视频 Swin 变换器提取深层多尺度特征。随后，利用设计的双上采样模块完成时空维度的恢复，并结合 CNN 实现帧重建。此外，还构建了一个增强数据集 BVI-DVC-VVC，用于模型训练和验证。模型的优化是通过对抗训练实现的。在 BVI-DVC-VVC 和 UCF101 上进行的大量实验证明了我们提出的模型在 VVC 视频错误隐藏方面的有效性和优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The Visual Computer

自引率

0.00%

发文量