2021 International Conference on Visual Communications and Image Processing (VCIP)最新文献

英文中文

Multi-Dimension Aware Back Projection Network For Scene Text Detection 用于场景文本检测的多维感知反向投影网络

2021 International Conference on Visual Communications and Image Processing (VCIP)

Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675323

Yizhan Zhao, Sumei Li, Yongli Chang

Recently, scene text detection based on deep learning has progressed substantially. Nevertheless, most previous models with FPN are limited by the drawback of sample interpolation algorithms, which fail to generate high-quality up-sampled features. Accordingly, we propose an end-to-end trainable text detector to alleviate the above dilemma. Specifically, a Back Projection Enhanced Up-sampling (BPEU) block is proposed to alleviate the drawback of sample interpolation algorithms. It significantly enhances the quality of up-sampled features by employing back projection and detail compensation. Further-more, a Multi-Dimensional Attention (MDA) block is devised to learn different knowledge from spatial and channel dimensions, which intelligently selects features to generate more discriminative representations. Experimental results on three benchmarks, ICDAR2015, ICDAR2017- MLT and MSRA-TD500, demonstrate the effectiveness of our method.

近年来，基于深度学习的场景文本检测技术取得了长足的进展。然而，大多数先前的FPN模型都受到样本插值算法的缺点的限制，无法生成高质量的上采样特征。因此，我们提出了一个端到端可训练的文本检测器来缓解上述困境。针对样本插值算法的不足，提出了一种反向投影增强上采样(BPEU)算法。通过反投影和细节补偿，显著提高了上采样特征的质量。在此基础上，设计了多维注意块(Multi-Dimensional Attention block, MDA)，从空间维度和通道维度学习不同的知识，智能选择特征，生成更具判别性的表征。在ICDAR2015、ICDAR2017- MLT和MSRA-TD500三个基准上的实验结果证明了该方法的有效性。

引用次数: 0

DIRECT: Discrete Image Rescaling with Enhancement from Case-specific Textures 直接:离散图像缩放与增强从个案特定的纹理

2021 International Conference on Visual Communications and Image Processing (VCIP)

Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675420

Yan-An Chen, Ching-Chun Hsiao, Wen-Hsiao Peng, Ching-Chun Huang

This paper addresses image rescaling, the task of which is to downscale an input image followed by upscaling for the purposes of transmission, storage, or playback on heterogeneous devices. The state-of-the-art image rescaling network (known as IRN) tackles image downscaling and upscaling as mutually invertible tasks using invertible affine coupling layers. In particular, for upscaling, IRN models the missing high-frequency component by an input-independent (case-agnostic) Gaussian noise. In this work, we take one step further to predict a case-specific high-frequency component from textures embedded in the downscaled image. Moreover, we adopt integer coupling layers to avoid quantizing the downscaled image. When tested on commonly used datasets, the proposed method, termed DIRECT, improves high-resolution reconstruction quality both subjectively and objectively, while maintaining visually pleasing downscaled images.

本文讨论图像重缩放，其任务是缩小输入图像，然后在异构设备上进行传输，存储或播放。最先进的图像缩放网络(称为IRN)使用可逆仿射耦合层将图像降尺度和上尺度作为相互可逆的任务来处理。特别是，对于升级，IRN通过输入无关(与情况无关)的高斯噪声对缺失的高频分量进行建模。在这项工作中，我们进一步从嵌入在缩小图像中的纹理中预测特定情况的高频成分。此外，我们还采用整数耦合层来避免图像的量化。当在常用数据集上进行测试时，所提出的方法(称为DIRECT)在主观上和客观上都提高了高分辨率重建质量，同时保持了视觉上令人愉悦的缩小图像。

引用次数: 1

Entropy-based Deep Product Quantization for Visual Search and Deep Feature Compression 基于熵的深度产品量化视觉搜索和深度特征压缩

2021 International Conference on Visual Communications and Image Processing (VCIP)

Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675383

Benben Niu, Ziwei Wei, Yun He

With the emergence of various machine-to-machine and machine-to-human tasks with deep learning, the amount of deep feature data is increasing. Deep product quantization is widely applied in deep feature retrieval tasks and has achieved good accuracy. However, it does not focus on the compression target primarily, and its output is a fixed-length quantization index, which is not suitable for subsequent compression. In this paper, we propose an entropy-based deep product quantization algorithm for deep feature compression. Firstly, it introduces entropy into hard and soft quantization strategies, which can adapt to the codebook optimization and codeword determination operations in the training and testing processes respectively. Secondly, the loss functions related to entropy are designed to adjust the distribution of quantization index, so that it can accommodate to the subsequent entropy coding module. Experimental results carried on retrieval tasks show that the proposed method can be generally combined with deep product quantization and its extended schemes, and can achieve a better compression performance under near lossless condition.

随着各种机器对机器和机器对人的深度学习任务的出现，深度特征数据的数量不断增加。深度产品量化在深度特征检索任务中得到了广泛的应用，并取得了良好的精度。但是，它并不主要关注压缩目标，其输出是定长量化指标，不适合后续压缩。本文提出了一种基于熵的深度积量化算法用于深度特征压缩。首先，在硬量化和软量化策略中引入熵，分别适应训练和测试过程中的码本优化和码字确定操作;其次，设计与熵相关的损失函数，调整量化指标的分布，使其适应后续的熵编码模块;在检索任务中进行的实验结果表明，该方法可以与深度积量化及其扩展方案相结合，在近无损条件下获得较好的压缩性能。

{"title":"Entropy-based Deep Product Quantization for Visual Search and Deep Feature Compression","authors":"Benben Niu, Ziwei Wei, Yun He","doi":"10.1109/VCIP53242.2021.9675383","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675383","url":null,"abstract":"With the emergence of various machine-to-machine and machine-to-human tasks with deep learning, the amount of deep feature data is increasing. Deep product quantization is widely applied in deep feature retrieval tasks and has achieved good accuracy. However, it does not focus on the compression target primarily, and its output is a fixed-length quantization index, which is not suitable for subsequent compression. In this paper, we propose an entropy-based deep product quantization algorithm for deep feature compression. Firstly, it introduces entropy into hard and soft quantization strategies, which can adapt to the codebook optimization and codeword determination operations in the training and testing processes respectively. Secondly, the loss functions related to entropy are designed to adjust the distribution of quantization index, so that it can accommodate to the subsequent entropy coding module. Experimental results carried on retrieval tasks show that the proposed method can be generally combined with deep product quantization and its extended schemes, and can achieve a better compression performance under near lossless condition.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131067395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Complex Event Recognition via Spatial-Temporal Relation Graph Reasoning 基于时空关系图推理的复杂事件识别

2021 International Conference on Visual Communications and Image Processing (VCIP)

Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675337

Hua Lin, Hongtian Zhao, Hua Yang

Events in videos usually contain a variety of factors: objects, environments, actions, and their interaction relations, and these factors as the mid-level semantics can bridge the gap between the event categories and the video clips. In this paper, we present a novel video events recognition method that uses the graph convolution networks to represent and reason the logic relations among the inner factors. Considering that different kinds of events may focus on different factors, we especially use the transformer networks to extract the spatial-temporal features drawing upon the attention mechanism that can adaptively assign weights to concerned key factors. Although transformers generally rely more on large datasets, we show the effectiveness of applying a 2D convolution backbone before the transformers. We train and test our framework on the challenging video event recognition dataset UCF-Crime and conduct ablation studies. The experimental results show that our method achieves state-of-the-art performance, outperforming previous principal advanced models with a significant margin of recognition accuracy.

视频中的事件通常包含多种因素:对象、环境、动作以及它们之间的交互关系，这些因素作为中间层语义可以弥补事件类别与视频剪辑之间的差距。本文提出了一种新的视频事件识别方法，利用图卷积网络来表示和推理内部因素之间的逻辑关系。考虑到不同类型的事件可能关注不同的因素，我们特别使用变压器网络提取时空特征，利用注意力机制自适应地为相关关键因素分配权重。虽然变压器通常更依赖于大型数据集，但我们展示了在变压器之前应用二维卷积主干的有效性。我们在具有挑战性的视频事件识别数据集UCF-Crime上训练和测试我们的框架，并进行消融研究。实验结果表明，我们的方法达到了最先进的性能，优于以前主要的先进模型，识别精度有显着的边际。

{"title":"Complex Event Recognition via Spatial-Temporal Relation Graph Reasoning","authors":"Hua Lin, Hongtian Zhao, Hua Yang","doi":"10.1109/VCIP53242.2021.9675337","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675337","url":null,"abstract":"Events in videos usually contain a variety of factors: objects, environments, actions, and their interaction relations, and these factors as the mid-level semantics can bridge the gap between the event categories and the video clips. In this paper, we present a novel video events recognition method that uses the graph convolution networks to represent and reason the logic relations among the inner factors. Considering that different kinds of events may focus on different factors, we especially use the transformer networks to extract the spatial-temporal features drawing upon the attention mechanism that can adaptively assign weights to concerned key factors. Although transformers generally rely more on large datasets, we show the effectiveness of applying a 2D convolution backbone before the transformers. We train and test our framework on the challenging video event recognition dataset UCF-Crime and conduct ablation studies. The experimental results show that our method achieves state-of-the-art performance, outperforming previous principal advanced models with a significant margin of recognition accuracy.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133499813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-time embedded hologram calculation for augmented reality glasses 增强现实眼镜的实时嵌入式全息图计算

2021 International Conference on Visual Communications and Image Processing (VCIP)

Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675435

Antonin Gilles

Thanks to its ability to provide accurate focus cues, Holography is considered as a promising display technology for augmented reality glasses. However, since it contains a large amount of data, the calculation of a hologram is a time-consuming process which results in prohibiting head-motion-to-photon latency, especially when using embedded calculation hardware. In this paper, we present a real-time hologram calculation method implemented on a NVIDIA Jetson AGX Xavier embedded platform. Our method is based on two modules: an offline pre-computation module and an on-the-fly hologram synthesis module. In the offline calculation module, the omnidirectional light field scattered by each scene object is individually pre-computed and stored in a Look-Up Table (LUT). Then, in the hologram synthesis module, the light waves corresponding to the viewer's position and orientation are extracted from the LUT in real-time to compute the hologram. Experimental results show that the proposed method is able to compute 2K1K color holograms at more than 50 frames per second, enabling its use in augmented reality applications.

由于能够提供准确的焦点提示，全息技术被认为是增强现实眼镜的一种有前途的显示技术。然而，由于它包含大量的数据，全息图的计算是一个耗时的过程，这导致禁止头部运动到光子的延迟，特别是当使用嵌入式计算硬件时。本文提出了一种在NVIDIA Jetson AGX Xavier嵌入式平台上实现的实时全息图计算方法。我们的方法基于两个模块:离线预计算模块和实时全息图合成模块。在离线计算模块中，每个场景物体散射的全向光场被单独预计算并存储在一个查找表(LUT)中。然后，在全息图合成模块中，从LUT中实时提取与观看者的位置和方向相对应的光波来计算全息图。实验结果表明，该方法能够以每秒50帧以上的速度计算2K1K彩色全息图，可用于增强现实应用。

引用次数: 0

Underwater Image Enhancement with Multi-Scale Residual Attention Network 基于多尺度残差注意网络的水下图像增强

2021 International Conference on Visual Communications and Image Processing (VCIP)

Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675342

Yosuke Ueki, M. Ikehara

Underwater images suffer from low contrast, color distortion and visibility degradation due to the light scattering and attenuation. Over the past few years, the importance of underwater image enhancement has increased because of ocean engineering and underwater robotics. Existing underwater image enhancement methods are based on various assumptions. However, it is almost impossible to define appropriate assumptions for underwater images due to the diversity of underwater images. Therefore, they are only effective for specific types of underwater images. Recently, underwater image enhancement algorisms using CNNs and GANS have been proposed, but they are not as advanced as other image processing methods due to the lack of suitable training data sets and the complexity of the issues. To solve the problems, we propose a novel underwater image enhancement method which combines the residual feature attention block and novel combination of multi-scale and multi-patch structure. Multi-patch network extracts local features to adjust to various underwater images which are often Non-homogeneous. In addition, our network includes multi-scale network which is often effective for image restoration. Experimental results show that our proposed method outperforms the conventional method for various types of images.

由于光散射和衰减，水下图像遭受低对比度，色彩失真和可见性下降。在过去的几年中，由于海洋工程和水下机器人技术的发展，水下图像增强的重要性日益增加。现有的水下图像增强方法基于各种假设。然而，由于水下图像的多样性，几乎不可能对水下图像定义合适的假设。因此，它们只对特定类型的水下图像有效。近年来，人们提出了基于cnn和gan的水下图像增强算法，但由于缺乏合适的训练数据集和问题的复杂性，它们不如其他图像处理方法先进。为了解决这些问题，我们提出了一种新的水下图像增强方法，该方法将残差特征注意块与多尺度、多斑块结构的新颖组合相结合。多补丁网络提取局部特征以适应各种非均匀的水下图像。此外，我们的网络包括多尺度网络，通常是有效的图像恢复。实验结果表明，对于各种类型的图像，本文提出的方法都优于传统方法。

{"title":"Underwater Image Enhancement with Multi-Scale Residual Attention Network","authors":"Yosuke Ueki, M. Ikehara","doi":"10.1109/VCIP53242.2021.9675342","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675342","url":null,"abstract":"Underwater images suffer from low contrast, color distortion and visibility degradation due to the light scattering and attenuation. Over the past few years, the importance of underwater image enhancement has increased because of ocean engineering and underwater robotics. Existing underwater image enhancement methods are based on various assumptions. However, it is almost impossible to define appropriate assumptions for underwater images due to the diversity of underwater images. Therefore, they are only effective for specific types of underwater images. Recently, underwater image enhancement algorisms using CNNs and GANS have been proposed, but they are not as advanced as other image processing methods due to the lack of suitable training data sets and the complexity of the issues. To solve the problems, we propose a novel underwater image enhancement method which combines the residual feature attention block and novel combination of multi-scale and multi-patch structure. Multi-patch network extracts local features to adjust to various underwater images which are often Non-homogeneous. In addition, our network includes multi-scale network which is often effective for image restoration. Experimental results show that our proposed method outperforms the conventional method for various types of images.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134552658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced Cross Component Sample Adaptive Offset for AVS3 增强的跨组件样本自适应偏移AVS3

2021 International Conference on Visual Communications and Image Processing (VCIP)

Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675321

Yunrui Jian, Jiaqi Zhang, Junru Li, Suhong Wang, Shanshe Wang, Siwei Ma, Wen Gao

Cross-component prediction has great potential for removing the redundancy of multi-components. Recently, cross-component sample adaptive offset (CCSAO) was adopted in the third generation of Audio Video coding Standard (AVS3), which utilizes the intensities of co-located luma samples to determine the offsets of chroma sample filters. However, the frame-level based offset is rough for various content, and the edge information of classified samples is ignored. In this paper, we propose an enhanced CCSAO (ECCSAO) method to further improve the coding performance. Firstly, four selectable 1-D directional patterns are added to make the mapping between luma and chroma components more effectively. Secondly, one four-layer quad-tree based structure is designed to improve the filtering flexibility of CCSAO. Experimental results show that the proposed approach achieves 1.51%, 2.33% and 2.68% BD-rate savings for All-Intra (AI), Random-Access (RA) and Low Delay B (LD) configurations compared to AVS3 reference software, respectively. A subset improvement of ECCSAO has been adopted by AVS3.

跨分量预测在消除多分量冗余方面具有很大的潜力。最近，第三代音视频编码标准(AVS3)采用了交叉分量样本自适应偏移(CCSAO)，利用同位亮度样本的强度来确定色度样本滤波器的偏移量。然而，基于帧级的偏移量对于各种内容来说是粗糙的，并且忽略了分类样本的边缘信息。为了进一步提高编码性能，本文提出了一种增强的CCSAO (ECCSAO)方法。首先，增加四个可选择的一维方向模式，使亮度和色度分量之间的映射更有效。其次，设计了一种基于四层四叉树的结构，提高了CCSAO的滤波灵活性;实验结果表明，与AVS3参考软件相比，该方法在All-Intra (AI)、Random-Access (RA)和Low Delay B (LD)配置下分别节省了1.51%、2.33%和2.68%的传输速率。AVS3采用了对ECCSAO的子集改进。

{"title":"Enhanced Cross Component Sample Adaptive Offset for AVS3","authors":"Yunrui Jian, Jiaqi Zhang, Junru Li, Suhong Wang, Shanshe Wang, Siwei Ma, Wen Gao","doi":"10.1109/VCIP53242.2021.9675321","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675321","url":null,"abstract":"Cross-component prediction has great potential for removing the redundancy of multi-components. Recently, cross-component sample adaptive offset (CCSAO) was adopted in the third generation of Audio Video coding Standard (AVS3), which utilizes the intensities of co-located luma samples to determine the offsets of chroma sample filters. However, the frame-level based offset is rough for various content, and the edge information of classified samples is ignored. In this paper, we propose an enhanced CCSAO (ECCSAO) method to further improve the coding performance. Firstly, four selectable 1-D directional patterns are added to make the mapping between luma and chroma components more effectively. Secondly, one four-layer quad-tree based structure is designed to improve the filtering flexibility of CCSAO. Experimental results show that the proposed approach achieves 1.51%, 2.33% and 2.68% BD-rate savings for All-Intra (AI), Random-Access (RA) and Low Delay B (LD) configurations compared to AVS3 reference software, respectively. A subset improvement of ECCSAO has been adopted by AVS3.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"2 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126326869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Telemoji: A video chat with automated recognition of facial expressions Telemoji:自动识别面部表情的视频聊天

2021 International Conference on Visual Communications and Image Processing (VCIP)

Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675330

Alex Kreinis, Tom Damri, Tomer Leon, Marina Litvak, Irina Rabaev

Autism spectrum disorder (ASD) is frequently ac-companied by impairment in emotional expression recognition, and therefore individuals with ASD may find it hard to interpret emotions and interact. Inspired by this fact, we developed a web-based video chat to assist people with ASD, both for real-time recognition of facial emotions and for practicing. This real-time application detects the speaker's face in a video stream and classifies the expressed emotion into one of the seven categories: neutral, surprise, happy, angry, disgust, fear, and sad. The classification is then displayed as the text label below the speaker's face. We developed this application as a part of the undergraduate project for the B.Sc. degree in Software Engineering. Its development and testing were made with the cooperation of the local society for children and adults with autism. The application has been released for unrestricted use on https://telemojii.herokuapp.com/. The demo is available at http://www.filedropper.com/telemojishortdemoblur.

自闭症谱系障碍(ASD)经常伴有情绪表达识别障碍，因此患有ASD的个体可能很难解释情绪和互动。受到这个事实的启发，我们开发了一个基于网络的视频聊天来帮助自闭症患者，既可以实时识别面部情绪，也可以练习。这个实时应用程序在视频流中检测说话者的脸，并将表达的情绪分为七类:中性、惊讶、快乐、愤怒、厌恶、恐惧和悲伤。然后分类显示为说话者脸部下方的文本标签。我们开发了这个应用程序，作为软件工程学士学位的本科项目的一部分。它的开发和测试是在当地自闭症儿童和成人协会的合作下进行的。该应用程序已在https://telemojii.herokuapp.com/上无限制地发布。该演示可在http://www.filedropper.com/telemojishortdemoblur上获得。

引用次数: 0

Pixel Gradient Based Zooming Method for Plenoptic Intra Prediction 基于像素梯度的全视场内预测缩放方法

2021 International Conference on Visual Communications and Image Processing (VCIP)

Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675380

Fan Jiang, Xin Jin, Kedeng Tong

Plenoptic 2.0 videos that record time-varying light fields by focused plenoptic cameras are prospective to immersive visual applications due to capturing dense sampled light fields with high spatial resolution in the rendered sub-apertures. In this paper, an intra prediction method is proposed for compressing multi-focus plenoptic 2.0 videos efficiently. Based on the estimation of zooming factor, novel gradient-feature-based zooming, adaptive-bilinear-interpolation-based tailoring and inverse-gradient-based boundary filtering are proposed and executed sequentially to generate accurate prediction candidates for weighted prediction working with adaptive skipping strategy. Experimental results demonstrate the superior performance of the proposed method relative to HEVC and state-of-the-art methods.

通过聚焦全光相机记录时变光场的Plenoptic 2.0视频有望实现沉浸式视觉应用，因为它可以在渲染的子孔径中以高空间分辨率捕获密集采样光场。本文提出了一种有效压缩多焦全光学2.0视频的帧内预测方法。在估计缩放因子的基础上，提出了基于梯度特征的缩放、基于自适应双线性插值的裁剪和基于反梯度的边界滤波，并依次执行，为自适应跳跃策略加权预测生成准确的预测候选者。实验结果表明，该方法相对于HEVC和现有方法具有更好的性能。

引用次数: 2

Reinforcement Learning based ROI Bit Allocation for Gaming Video Coding in VVC 基于强化学习的VVC游戏视频编码ROI位分配

2021 International Conference on Visual Communications and Image Processing (VCIP)

Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675345

Guangjie Ren, Zizheng Liu, Zhenzhong Chen, Shan Liu

In this paper, we propose a reinforcement learning based region of interest (ROI) bit allocation method for gaming video coding in Versatile Video Coding (VVC). Most current ROI-based bit allocation methods rely on bit budgets based on frame-level empirical weight allocation. The restricted bit budgets influence the efficiency of ROI-based bit allocation and the stability of video quality. To address this issue, the bit allocation process of frame and ROI are combined and formulated as a Markov decision process (MDP). A deep reinforcement learning (RL) method is adopted to solve this problem and obtain the appropriate bits of frame and ROI. Our target is to improve the quality of ROI and reduce the frame-level quality fluctuation, whilst satisfying the bit budgets constraint. The RL-based ROI bit allocation method is implemented in the latest video coding standard and verified for gaming video coding. The experimental results demonstrate that the proposed method achieves a better quality of ROI while reducing the quality fluctuation compared to the reference methods.

本文提出了一种基于强化学习的兴趣区域(ROI)比特分配方法，用于通用视频编码(VVC)中的游戏视频编码。目前大多数基于roi的比特分配方法依赖于基于帧级经验权重分配的比特预算。有限的比特预算影响了基于roi的比特分配效率和视频质量的稳定性。为了解决这一问题，将帧和ROI的比特分配过程结合起来，形成马尔可夫决策过程(MDP)。采用深度强化学习(RL)方法解决了这一问题，获得了合适的帧位和ROI。我们的目标是提高ROI的质量，减少帧级质量波动，同时满足比特预算约束。在最新的视频编码标准中实现了基于rl的ROI位分配方法，并对游戏视频编码进行了验证。实验结果表明，与参考方法相比，该方法在降低ROI质量波动的同时获得了更好的ROI质量。

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2021 International Conference on Visual Communications and Image Processing (VCIP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀