Swin-VEC:基于视频斯温变换器的 GAN,用于 VVC 的视频错误隐藏

Bing Zhang, Ran Ma, Yu Cao, Ping An
{"title":"Swin-VEC:基于视频斯温变换器的 GAN,用于 VVC 的视频错误隐藏","authors":"Bing Zhang, Ran Ma, Yu Cao, Ping An","doi":"10.1007/s00371-024-03518-9","DOIUrl":null,"url":null,"abstract":"<p>Video error concealment can effectively improve the visual perception quality of videos damaged by packet loss in video transmission or error reception at the decoder. The latest versatile video coding (VVC) standard further improves the compression performance and lacks error recovery mechanism, which makes the VVC bitstream highly sensitive to errors. Most of the existing error concealment algorithms are designed for the video coding standards before VVC and are not applicable to VVC; thus, the research on video error concealment for VVC is urgently needed. In this paper, a novel deep video error concealment model for VVC is proposed, called Swin-VEC. The model innovatively integrates Video Swin Transformer into the generator of generative adversarial network (GAN). Specifically, the generator of the model employs convolutional neural network (CNN) to extract shallow features, and utilizes the Video Swin Transformer to extract deep multi-scale features. Subsequently, the designed dual upsampling modules are used to accomplish the recovery of spatiotemporal dimensions, and combined with CNN to achieve frame reconstruction. Moreover, an augmented dataset BVI-DVC-VVC is constructed for model training and verification. The optimization of the model is realized by adversarial training. Extensive experiments on BVI-DVC-VVC and UCF101 demonstrate the effectiveness and superiority of our proposed model for the video error concealment of VVC.\n</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Swin-VEC: Video Swin Transformer-based GAN for video error concealment of VVC\",\"authors\":\"Bing Zhang, Ran Ma, Yu Cao, Ping An\",\"doi\":\"10.1007/s00371-024-03518-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Video error concealment can effectively improve the visual perception quality of videos damaged by packet loss in video transmission or error reception at the decoder. The latest versatile video coding (VVC) standard further improves the compression performance and lacks error recovery mechanism, which makes the VVC bitstream highly sensitive to errors. Most of the existing error concealment algorithms are designed for the video coding standards before VVC and are not applicable to VVC; thus, the research on video error concealment for VVC is urgently needed. In this paper, a novel deep video error concealment model for VVC is proposed, called Swin-VEC. The model innovatively integrates Video Swin Transformer into the generator of generative adversarial network (GAN). Specifically, the generator of the model employs convolutional neural network (CNN) to extract shallow features, and utilizes the Video Swin Transformer to extract deep multi-scale features. Subsequently, the designed dual upsampling modules are used to accomplish the recovery of spatiotemporal dimensions, and combined with CNN to achieve frame reconstruction. Moreover, an augmented dataset BVI-DVC-VVC is constructed for model training and verification. The optimization of the model is realized by adversarial training. Extensive experiments on BVI-DVC-VVC and UCF101 demonstrate the effectiveness and superiority of our proposed model for the video error concealment of VVC.\\n</p>\",\"PeriodicalId\":501186,\"journal\":{\"name\":\"The Visual Computer\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Visual Computer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00371-024-03518-9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03518-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

视频错误隐藏可以有效改善因视频传输过程中丢包或解码器接收错误而受损的视频的视觉感知质量。最新的通用视频编码(VVC)标准进一步提高了压缩性能,但缺乏错误恢复机制,这使得 VVC 比特流对错误高度敏感。现有的错误隐藏算法大多是针对 VVC 之前的视频编码标准设计的,不适用于 VVC;因此,针对 VVC 的视频错误隐藏研究迫在眉睫。本文提出了一种新型的 VVC 深度视频错误隐藏模型,称为 Swin-VEC。该模型创新性地将视频 Swin 变换器集成到生成式对抗网络(GAN)的生成器中。具体来说,该模型的生成器采用卷积神经网络(CNN)提取浅层特征,并利用视频 Swin 变换器提取深层多尺度特征。随后,利用设计的双上采样模块完成时空维度的恢复,并结合 CNN 实现帧重建。此外,还构建了一个增强数据集 BVI-DVC-VVC,用于模型训练和验证。模型的优化是通过对抗训练实现的。在 BVI-DVC-VVC 和 UCF101 上进行的大量实验证明了我们提出的模型在 VVC 视频错误隐藏方面的有效性和优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Swin-VEC: Video Swin Transformer-based GAN for video error concealment of VVC

Video error concealment can effectively improve the visual perception quality of videos damaged by packet loss in video transmission or error reception at the decoder. The latest versatile video coding (VVC) standard further improves the compression performance and lacks error recovery mechanism, which makes the VVC bitstream highly sensitive to errors. Most of the existing error concealment algorithms are designed for the video coding standards before VVC and are not applicable to VVC; thus, the research on video error concealment for VVC is urgently needed. In this paper, a novel deep video error concealment model for VVC is proposed, called Swin-VEC. The model innovatively integrates Video Swin Transformer into the generator of generative adversarial network (GAN). Specifically, the generator of the model employs convolutional neural network (CNN) to extract shallow features, and utilizes the Video Swin Transformer to extract deep multi-scale features. Subsequently, the designed dual upsampling modules are used to accomplish the recovery of spatiotemporal dimensions, and combined with CNN to achieve frame reconstruction. Moreover, an augmented dataset BVI-DVC-VVC is constructed for model training and verification. The optimization of the model is realized by adversarial training. Extensive experiments on BVI-DVC-VVC and UCF101 demonstrate the effectiveness and superiority of our proposed model for the video error concealment of VVC.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Advanced deepfake detection with enhanced Resnet-18 and multilayer CNN max pooling Video-driven musical composition using large language model with memory-augmented state space 3D human pose estimation using spatiotemporal hypergraphs and its public benchmark on opera videos Topological structure extraction for computing surface–surface intersection curves Lunet: an enhanced upsampling fusion network with efficient self-attention for semantic segmentation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1