首页 > 最新文献

2021 International Conference on Visual Communications and Image Processing (VCIP)最新文献

英文 中文
Visually Optimized Two-Pass Rate Control for Video Coding Using the Low-Complexity XPSNR Model 使用低复杂度XPSNR模型可视化优化视频编码的双通率控制
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675364
Christian R. Helmrich, Ivan Zupancic, J. Brandenburg, Valeri George, A. Wieckowski, B. Bross
Two-pass rate control (RC) schemes have proven useful for generating low-bitrate video-on-demand or streaming catalogs. Visually optimized encoding particularly using latest-generation coding standards like Versatile Video Coding (VVC), however, is still a subject of intensive study. This paper describes the two-pass RC method integrated into version 1 of VVenC, an open VVC encoding software. The RC design is based on a novel two-step rate-quantization parameter (R-QP) model to derive the second-pass coding parameters, and it uses the low-complexity XPSNR visual distortion measure to provide numerically as well as visually stable, perceptually R-D optimized encoding results. Random-access evaluation experiments confirm the improved objective as well as subjective performance of our RC solution.
双通率控制(RC)方案已被证明对生成低比特率视频点播或流媒体目录很有用。然而,视觉优化编码,特别是使用最新一代编码标准,如通用视频编码(VVC),仍然是一个深入研究的主题。本文介绍了在VVC开放编码软件VVenC版本1中集成的两路RC方法。RC设计基于一种新颖的两步率量化参数(R-QP)模型来推导第二步编码参数,并使用低复杂度的XPSNR视觉失真测量来提供数字上和视觉上稳定的、感知上R-D优化的编码结果。随机访问评估实验证实了改进的RC解决方案的客观和主观性能。
{"title":"Visually Optimized Two-Pass Rate Control for Video Coding Using the Low-Complexity XPSNR Model","authors":"Christian R. Helmrich, Ivan Zupancic, J. Brandenburg, Valeri George, A. Wieckowski, B. Bross","doi":"10.1109/VCIP53242.2021.9675364","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675364","url":null,"abstract":"Two-pass rate control (RC) schemes have proven useful for generating low-bitrate video-on-demand or streaming catalogs. Visually optimized encoding particularly using latest-generation coding standards like Versatile Video Coding (VVC), however, is still a subject of intensive study. This paper describes the two-pass RC method integrated into version 1 of VVenC, an open VVC encoding software. The RC design is based on a novel two-step rate-quantization parameter (R-QP) model to derive the second-pass coding parameters, and it uses the low-complexity XPSNR visual distortion measure to provide numerically as well as visually stable, perceptually R-D optimized encoding results. Random-access evaluation experiments confirm the improved objective as well as subjective performance of our RC solution.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130082554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Probability-based decoder-side intra mode derivation for VVC 基于概率的VVC解码器侧模内推导
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675443
Yang Wang, Li Zhang, Kai Zhang, Yuwen He, Hongbin Liu
Intra prediction is typically used to exploit the spatial redundancy in video coding. In the latest video coding standard Versatile Video Coding (VVC), 67 intra prediction modes are adopted in intra prediction. The encoder selects the best one from 67 modes and signals it to the decoder. Bits consuming of signaling the selected mode may limit the coding efficiency. To reduce the overhead of signaling the intra prediction mode, a probability-based decoder-side intra mode derivation (P-DIMD) is proposed in this paper. Specifically, an intra prediction mode candidate set is constructed based on the probabilities of intra prediction modes. The probability of an intra prediction mode is mainly estimated in two ways. First, the textures are typically continuous within a local region and intra prediction modes of neighboring blocks are similar to each other. Second, some intra prediction modes are preferable to be used than others. For each intra prediction mode in the constructed candidate set, intra prediction is processed on a template to calculate a cost. The intra prediction mode with the minimum cost is determined as the optimal mode and used in the intra prediction of the current block. Experimental results demonstrate that P-DIMD can achieve 0.56% BD-rate saving on average compared to VTM-11.0 under all intra configuration.
在视频编码中,通常使用帧内预测来利用空间冗余。在最新的视频编码标准VVC (Versatile video coding)中,帧内预测采用了67种帧内预测模式。编码器从67种模式中选择最好的一种,并将其发送给解码器。所选模式下信令的比特消耗会限制编码效率。为了减少码内预测模式的信令开销,本文提出了一种基于概率的解码侧码内模式推导方法。具体而言,基于内预测模式的概率构造了内预测模式候选集。内预测模式的概率估计主要有两种方法。首先,纹理在局部区域内通常是连续的,相邻块的内部预测模式彼此相似。其次,一些内部预测模式比其他模式更可取。对于构建的候选集中的每个内部预测模式,在模板上处理内部预测以计算代价。确定代价最小的内预测模式为最优模式,并将其用于当前区块的内预测。实验结果表明,在所有内部配置下,P-DIMD比VTM-11.0平均节省0.56%的bd速率。
{"title":"Probability-based decoder-side intra mode derivation for VVC","authors":"Yang Wang, Li Zhang, Kai Zhang, Yuwen He, Hongbin Liu","doi":"10.1109/VCIP53242.2021.9675443","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675443","url":null,"abstract":"Intra prediction is typically used to exploit the spatial redundancy in video coding. In the latest video coding standard Versatile Video Coding (VVC), 67 intra prediction modes are adopted in intra prediction. The encoder selects the best one from 67 modes and signals it to the decoder. Bits consuming of signaling the selected mode may limit the coding efficiency. To reduce the overhead of signaling the intra prediction mode, a probability-based decoder-side intra mode derivation (P-DIMD) is proposed in this paper. Specifically, an intra prediction mode candidate set is constructed based on the probabilities of intra prediction modes. The probability of an intra prediction mode is mainly estimated in two ways. First, the textures are typically continuous within a local region and intra prediction modes of neighboring blocks are similar to each other. Second, some intra prediction modes are preferable to be used than others. For each intra prediction mode in the constructed candidate set, intra prediction is processed on a template to calculate a cost. The intra prediction mode with the minimum cost is determined as the optimal mode and used in the intra prediction of the current block. Experimental results demonstrate that P-DIMD can achieve 0.56% BD-rate saving on average compared to VTM-11.0 under all intra configuration.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131617768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mixed-precision Quantization with Dynamical Hessian Matrix for Object Detection Network 基于动态Hessian矩阵的目标检测网络混合精度量化
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675341
Zerui Yang, Wen Fei, Wenrui Dai, Chenglin Li, Junni Zou, H. Xiong
Mixed-precision quantization with adaptive bitwidth allocation for neural network has achieved higher compression rate and accuracy in classification task. However, it has not been well explored for object detection networks. In this paper, we propose a novel mixed-precision quantization scheme with dynamical Hessian matrix for object detection networks. We iteratively select a layer with the lowest sensitivity based on the Hessian matrix and downgrade its precision to reach the required compression ratio. The L-BFGS algorithm is utilized for updating the Hessian matrix in each quantization iteration. Moreover, we specifically design the loss function for objection detection networks by jointly considering the quantization effects on classification and regression loss. Experimental results on RetinaNet and Faster R-CNN show that the proposed DHMQ achieves state-of-the-art performance for quantized object detec-tors.
神经网络自适应位宽分配的混合精度量化在分类任务中获得了更高的压缩率和准确率。然而,它还没有很好地探索目标检测网络。本文提出了一种基于动态Hessian矩阵的目标检测网络混合精度量化方案。我们基于Hessian矩阵迭代选择灵敏度最低的层,并降低其精度以达到所需的压缩比。利用L-BFGS算法在每次量化迭代中更新Hessian矩阵。此外,我们结合量化对分类和回归损失的影响,专门设计了目标检测网络的损失函数。在retanet和Faster R-CNN上的实验结果表明,所提出的DHMQ实现了最先进的量化目标检测器性能。
{"title":"Mixed-precision Quantization with Dynamical Hessian Matrix for Object Detection Network","authors":"Zerui Yang, Wen Fei, Wenrui Dai, Chenglin Li, Junni Zou, H. Xiong","doi":"10.1109/VCIP53242.2021.9675341","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675341","url":null,"abstract":"Mixed-precision quantization with adaptive bitwidth allocation for neural network has achieved higher compression rate and accuracy in classification task. However, it has not been well explored for object detection networks. In this paper, we propose a novel mixed-precision quantization scheme with dynamical Hessian matrix for object detection networks. We iteratively select a layer with the lowest sensitivity based on the Hessian matrix and downgrade its precision to reach the required compression ratio. The L-BFGS algorithm is utilized for updating the Hessian matrix in each quantization iteration. Moreover, we specifically design the loss function for objection detection networks by jointly considering the quantization effects on classification and regression loss. Experimental results on RetinaNet and Faster R-CNN show that the proposed DHMQ achieves state-of-the-art performance for quantized object detec-tors.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128941435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SalGFCN: Graph Based Fully Convolutional Network for Panoramic Saliency Prediction 基于图的全景显著性预测全卷积网络
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675373
Yiwei Yang, Yucheng Zhu, Zhongpai Gao, Guangtao Zhai
The saliency prediction of panoramic images is dramatically affected by the distortion caused by non-Euclidean geometry characteristic. Traditional CNN based saliency pre-diction algorithms for 2D images are no longer suitable for 360-degree images. Intuitively, we propose a graph based fully convolutional network for saliency prediction of 360-degree images, which can reasonably map panoramic pixels to spherical graph data structures for representation. The saliency prediction network is based on residual U-Net architecture, with dilated graph convolutions and attention mechanism in the bottleneck. Furthermore, we design a fully convolutional layer for graph pooling and unpooling operations in spherical graph space to retain node-to-node features. Experimental results show that our proposed method outperforms other state-of-the-art saliency models on the large-scale dataset.
非欧几里得几何特性引起的畸变对全景图像的显著性预测有很大影响。传统的基于CNN的二维图像显著性预测算法已经不适用于360度图像。直观地,我们提出了一个基于图的全卷积网络,用于360度图像的显著性预测,该网络可以合理地将全景像素映射到球面图数据结构进行表示。该显著性预测网络基于残差U-Net架构,在瓶颈处加入了扩展图卷积和注意机制。此外,我们设计了一个全卷积层,用于球面图空间中的图池化和解池操作,以保留节点到节点的特征。实验结果表明,我们提出的方法在大规模数据集上优于其他最先进的显著性模型。
{"title":"SalGFCN: Graph Based Fully Convolutional Network for Panoramic Saliency Prediction","authors":"Yiwei Yang, Yucheng Zhu, Zhongpai Gao, Guangtao Zhai","doi":"10.1109/VCIP53242.2021.9675373","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675373","url":null,"abstract":"The saliency prediction of panoramic images is dramatically affected by the distortion caused by non-Euclidean geometry characteristic. Traditional CNN based saliency pre-diction algorithms for 2D images are no longer suitable for 360-degree images. Intuitively, we propose a graph based fully convolutional network for saliency prediction of 360-degree images, which can reasonably map panoramic pixels to spherical graph data structures for representation. The saliency prediction network is based on residual U-Net architecture, with dilated graph convolutions and attention mechanism in the bottleneck. Furthermore, we design a fully convolutional layer for graph pooling and unpooling operations in spherical graph space to retain node-to-node features. Experimental results show that our proposed method outperforms other state-of-the-art saliency models on the large-scale dataset.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"8 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131437935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Deep Motion Flow Aided Face Video De-identification 深度运动流辅助人脸视频去识别
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675353
Yunqian Wen, Bo Liu, Rong Xie, Jingyi Cao, Li Song
Advances in cameras and web technology have made it easy to capture and share large amounts of face videos over to an unknown audience with uncontrollable purposes. These raise increasing concerns about unwanted identity-relevant computer vision devices invading the characters's privacy. Previous de-identification methods rely on designing novel neural networks and processing face videos frame by frame, which ignore the data feature in redundancy and continuity. Besides, these techniques are incapable of well-balancing privacy and utility, and per-frame evaluation is easy to cause flicker. In this paper, we present deep motion flow, which can create remarkable de-identified face videos with a good privacy-utility tradeoff. It calculates the relative dense motion flow between every two adjacent original frames and runs the high quality image anonymization only on the first frame. The de-identified video will be obtained based on the anonymous first frame via the relative dense motion flow. Extensive experiments demonstrate the effectiveness of our proposed de-identification method.
相机和网络技术的进步使得捕捉和分享大量面部视频变得容易,这些视频的目的是无法控制的。这引起了人们越来越多的担忧,即与身份相关的不必要的计算机视觉设备会侵犯角色的隐私。以往的去识别方法依赖于设计新颖的神经网络和逐帧处理人脸视频,忽略了数据的冗余性和连续性特征。此外,这些技术无法很好地平衡私密性和实用性,并且逐帧评估容易导致闪烁。在本文中,我们提出了深度运动流,它可以创建出色的去识别人脸视频,并具有良好的隐私-效用权衡。它计算每两个相邻原始帧之间的相对密集运动流,并仅在第一帧上运行高质量图像匿名化。在匿名第一帧的基础上,通过相对密集的运动流获得去识别视频。大量的实验证明了我们提出的去识别方法的有效性。
{"title":"Deep Motion Flow Aided Face Video De-identification","authors":"Yunqian Wen, Bo Liu, Rong Xie, Jingyi Cao, Li Song","doi":"10.1109/VCIP53242.2021.9675353","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675353","url":null,"abstract":"Advances in cameras and web technology have made it easy to capture and share large amounts of face videos over to an unknown audience with uncontrollable purposes. These raise increasing concerns about unwanted identity-relevant computer vision devices invading the characters's privacy. Previous de-identification methods rely on designing novel neural networks and processing face videos frame by frame, which ignore the data feature in redundancy and continuity. Besides, these techniques are incapable of well-balancing privacy and utility, and per-frame evaluation is easy to cause flicker. In this paper, we present deep motion flow, which can create remarkable de-identified face videos with a good privacy-utility tradeoff. It calculates the relative dense motion flow between every two adjacent original frames and runs the high quality image anonymization only on the first frame. The de-identified video will be obtained based on the anonymous first frame via the relative dense motion flow. Extensive experiments demonstrate the effectiveness of our proposed de-identification method.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133829069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Analyzing Time Complexity of Practical Learned Image Compression Models 实用学习图像压缩模型的时间复杂度分析
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675424
Xiaohan Pan, Zongyu Guo, Zhibo Chen
We have witnessed the rapid development of learned image compression (LIC). The latest LIC models have outperformed almost all traditional image compression standards in terms of rate-distortion (RD) performance. However, the time complexity of LIC model is still underdiscovered, limiting the practical applications in industry. Even with the acceleration of GPU, LIC models still struggle with long coding time, especially on the decoder side. In this paper, we analyze and test a few prevailing and representative LIC models, and compare their complexity with traditional codecs including H.265/HEVC intra and H.266/VVC intra. We provide a comprehensive analysis on every module in the LIC models, and investigate how bitrate changes affect coding time. We observe that the time complexity bottleneck mainly exists in entropy coding and context modelling. Although this paper pay more attention to experimental statistics, our analysis reveals some insights for further acceleration of LIC model, such as model modification for parallel computing, model pruning and a more parallel context model.
我们见证了学习图像压缩(LIC)的快速发展。最新的LIC模型在率失真(RD)性能方面优于几乎所有传统的图像压缩标准。然而,LIC模型的时间复杂度仍未被充分发现,限制了其在工业上的实际应用。即使有GPU的加速,LIC模型仍然挣扎于较长的编码时间,特别是在解码器方面。本文分析和测试了几种流行的和有代表性的LIC模型,并将其与传统编解码器H.265/HEVC intra和H.266/VVC intra的复杂度进行了比较。我们对LIC模型中的每个模块进行了全面分析,并研究了比特率变化如何影响编码时间。我们发现时间复杂度瓶颈主要存在于熵编码和上下文建模中。虽然本文更注重实验统计,但我们的分析为进一步加速LIC模型提供了一些见解,如针对并行计算的模型修改、模型修剪和更并行的上下文模型。
{"title":"Analyzing Time Complexity of Practical Learned Image Compression Models","authors":"Xiaohan Pan, Zongyu Guo, Zhibo Chen","doi":"10.1109/VCIP53242.2021.9675424","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675424","url":null,"abstract":"We have witnessed the rapid development of learned image compression (LIC). The latest LIC models have outperformed almost all traditional image compression standards in terms of rate-distortion (RD) performance. However, the time complexity of LIC model is still underdiscovered, limiting the practical applications in industry. Even with the acceleration of GPU, LIC models still struggle with long coding time, especially on the decoder side. In this paper, we analyze and test a few prevailing and representative LIC models, and compare their complexity with traditional codecs including H.265/HEVC intra and H.266/VVC intra. We provide a comprehensive analysis on every module in the LIC models, and investigate how bitrate changes affect coding time. We observe that the time complexity bottleneck mainly exists in entropy coding and context modelling. Although this paper pay more attention to experimental statistics, our analysis reveals some insights for further acceleration of LIC model, such as model modification for parallel computing, model pruning and a more parallel context model.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"2008 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128159255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Nanopore Sequencing Simulator for DNA Data Storage DNA数据存储的纳米孔测序模拟器
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675388
Eva Gil San Antonio, T. Heinis, Louis Carteron, Melpomeni Dimopoulou, M. Antonini
The exponential increase of digital data and the limited capacity of current storage devices have made clear the need for exploring new storage solutions. Thanks to its biological properties, DNA has proven to be a potential candidate for this task, allowing the storage of information at a high density for hundreds or even thousands of years. With the release of nanopore sequencing technologies, DNA data storage is one step closer to become a reality. Many works have proposed solutions for the simulation of this sequencing step, aiming to ease the development of algorithms addressing nanopore-sequenced reads. However, these simulators target the sequencing of complete genomes, whose characteristics differ from the ones of synthetic DNA. This work presents a nanopore sequencing simulator targeting synthetic DNA on the context of DNA data storage.
数字数据的指数级增长和当前存储设备的有限容量已经明确需要探索新的存储解决方案。由于其生物学特性,DNA已被证明是这项任务的潜在候选者,它允许以高密度存储信息数百年甚至数千年。随着纳米孔测序技术的发布,DNA数据存储离成为现实又近了一步。许多工作已经提出了模拟这一测序步骤的解决方案,旨在简化处理纳米孔测序读取的算法的开发。然而,这些模拟器的目标是全基因组测序,其特征不同于合成DNA。这项工作提出了一个纳米孔测序模拟器针对DNA数据存储的背景下合成DNA。
{"title":"Nanopore Sequencing Simulator for DNA Data Storage","authors":"Eva Gil San Antonio, T. Heinis, Louis Carteron, Melpomeni Dimopoulou, M. Antonini","doi":"10.1109/VCIP53242.2021.9675388","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675388","url":null,"abstract":"The exponential increase of digital data and the limited capacity of current storage devices have made clear the need for exploring new storage solutions. Thanks to its biological properties, DNA has proven to be a potential candidate for this task, allowing the storage of information at a high density for hundreds or even thousands of years. With the release of nanopore sequencing technologies, DNA data storage is one step closer to become a reality. Many works have proposed solutions for the simulation of this sequencing step, aiming to ease the development of algorithms addressing nanopore-sequenced reads. However, these simulators target the sequencing of complete genomes, whose characteristics differ from the ones of synthetic DNA. This work presents a nanopore sequencing simulator targeting synthetic DNA on the context of DNA data storage.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127342332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Face 2D to 3D Reconstruction Network Based on Head Pose and 3D Facial Landmarks 基于头部姿态和三维面部地标的人脸二维到三维重构网络
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675325
Yuanquan Xu, Cheolkon Jung
Although most existing methods based on 3D mor-phable model (3DMM) need annotated parameters for training as ground truth, only a few datasets contain them. Moreover, it is difficult to acquire accurate 3D face models aligned with the input images due to the gap in dimensions. In this paper, we propose a face 2D to 3D reconstruction network based on head pose and 3D facial landmarks. We build a head pose guided face reconstruction network to regress an accurate 3D face model with the help of 3D facial landmarks. Different from 3DMM parameters, head pose and 3D facial landmarks are successfully estimated even in the wild images. Experiments on 300W-LP, AFLW2000-3D and CelebA HQ datasets show that the proposed method successfully reconstructs 3D face model from a single RGB image thanks to 3D facial landmarks as well as achieves state-of-the-art performance in terms of the normalized mean error (NME).
虽然现有的基于三维可移动模型(3DMM)的训练方法大多需要带注释的参数作为基础真值,但只有少数数据集包含这些参数。此外,由于尺寸上的差距,难以获得与输入图像对齐的精确三维人脸模型。本文提出了一种基于头部姿态和三维面部地标的人脸二维到三维重建网络。我们构建了一个头部姿态引导下的人脸重建网络,利用三维人脸地标回归精确的三维人脸模型。与3DMM参数不同的是,即使在野生图像中,头部姿态和3D面部地标也能成功估计。在300W-LP、AFLW2000-3D和CelebA HQ数据集上进行的实验表明,该方法利用三维面部地标成功地从单幅RGB图像重建三维人脸模型,并在归一化平均误差(NME)方面达到了最先进的性能。
{"title":"Face 2D to 3D Reconstruction Network Based on Head Pose and 3D Facial Landmarks","authors":"Yuanquan Xu, Cheolkon Jung","doi":"10.1109/VCIP53242.2021.9675325","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675325","url":null,"abstract":"Although most existing methods based on 3D mor-phable model (3DMM) need annotated parameters for training as ground truth, only a few datasets contain them. Moreover, it is difficult to acquire accurate 3D face models aligned with the input images due to the gap in dimensions. In this paper, we propose a face 2D to 3D reconstruction network based on head pose and 3D facial landmarks. We build a head pose guided face reconstruction network to regress an accurate 3D face model with the help of 3D facial landmarks. Different from 3DMM parameters, head pose and 3D facial landmarks are successfully estimated even in the wild images. Experiments on 300W-LP, AFLW2000-3D and CelebA HQ datasets show that the proposed method successfully reconstructs 3D face model from a single RGB image thanks to 3D facial landmarks as well as achieves state-of-the-art performance in terms of the normalized mean error (NME).","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129926816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A High Accuracy Camera Calibration Method for Sport Videos 一种高精度运动视频摄像机标定方法
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675379
Neng Zhang, E. Izquierdo
Camera calibration for sport videos enables precise and natural delivery of graphics on video footage and several other special effects. This in turns substantially improves the visual experience in the audience and facilitates sports analysis within or after the live show. In this paper, we propose a high accuracy camera calibration method for sport videos. First, we generate a homography database by uniformly sampling camera parameters. This database includes more than 91 thousand different homography matrices. Then, we use the conditional generative adversarial network (cGAN) to achieve semantic segmentation splitting the broadcast frames into four classes. In a subsequent processing step, we build an effective feature extraction network to extract the feature of semantic segmented images. After that, we search for the feature in the database to find the best matching homography. Finally, we refine the homography by image alignment. In a comprehensive evaluation using the 2014 World Cup dataset, our method outperforms other state-of-the-art techniques.
体育视频的摄像机校准可以在视频片段和其他一些特殊效果上精确和自然地传递图形。这反过来又大大改善了观众的视觉体验,并便于在现场表演中或之后进行体育分析。本文提出了一种高精度的运动视频摄像机标定方法。首先,对相机参数进行均匀采样,生成单应性数据库。该数据库包括超过91,000种不同的单应性矩阵。然后,我们使用条件生成对抗网络(cGAN)将广播帧分成四类来实现语义分割。在后续的处理步骤中,我们构建了一个有效的特征提取网络来提取语义分割图像的特征。然后在数据库中搜索特征,找到最匹配的同形词。最后,我们通过图像对齐来改进单应性。在使用2014年世界杯数据集的综合评估中,我们的方法优于其他最先进的技术。
{"title":"A High Accuracy Camera Calibration Method for Sport Videos","authors":"Neng Zhang, E. Izquierdo","doi":"10.1109/VCIP53242.2021.9675379","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675379","url":null,"abstract":"Camera calibration for sport videos enables precise and natural delivery of graphics on video footage and several other special effects. This in turns substantially improves the visual experience in the audience and facilitates sports analysis within or after the live show. In this paper, we propose a high accuracy camera calibration method for sport videos. First, we generate a homography database by uniformly sampling camera parameters. This database includes more than 91 thousand different homography matrices. Then, we use the conditional generative adversarial network (cGAN) to achieve semantic segmentation splitting the broadcast frames into four classes. In a subsequent processing step, we build an effective feature extraction network to extract the feature of semantic segmented images. After that, we search for the feature in the database to find the best matching homography. Finally, we refine the homography by image alignment. In a comprehensive evaluation using the 2014 World Cup dataset, our method outperforms other state-of-the-art techniques.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128929172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
See SIFT in a Rain: Divide-and-conquer SIFT Key Point Recovery from a Single Rainy Image 参见雨中的SIFT:从单个雨图像中分治SIFT关键点恢复
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675434
Ping Wang, Wei Wu, Zhu-jun Li, Yong Liu
Scale-Invariant Feature Transform (SIFT) is one of the most well-known image matching methods, which has been widely applied in various visual fields. Because of the adoption of a difference of Gaussian (DoG) pyramid and Gaussian gradient information for extrema detection and description, respectively, SIFT achieves accurate key points and thus has shown excellent matching results but except under adverse weather conditions like rain. To address the issue, in the paper we propose a divide-and-conquer SIFT key points recovery algorithm from a single rainy image. In the proposed algorithm, we do not aim to improve quality for a derained image, but divide the key point recovery problem from a rainy image into two sub-problems, one being how to recover the DoG pyramid for the derained image and the other being how to recover the gradients of derained Gaussian images at multiple scales. We also propose two separate deep learning networks with different losses and structures to recover them, respectively. This divide-and-conquer scheme to set different objectives for SIFT extrema detection and description leads to very robust performance. Experimental results show that our proposed algorithm achieves state-of-the-art performances on widely used image datasets in both quantitative and qualitative tests.
尺度不变特征变换(SIFT)是最著名的图像匹配方法之一,在各个视觉领域得到了广泛的应用。由于SIFT分别采用了高斯差分金字塔(DoG)和高斯梯度信息进行极值点的检测和描述,因此SIFT得到了准确的关键点,除了在降雨等恶劣天气条件下,SIFT的匹配效果非常好。为了解决这个问题,本文提出了一种分而治之的SIFT关键点恢复算法。在本文提出的算法中,我们不以提高图像的质量为目标,而是将雨天图像的关键点恢复问题分为两个子问题,一个是如何恢复图像的DoG金字塔,另一个是如何恢复多尺度下高斯图像的梯度。我们还提出了两个独立的深度学习网络,分别具有不同的损失和结构来恢复它们。这种分而治之的方案为SIFT的极值检测和描述设定了不同的目标,从而获得了非常棒的性能。实验结果表明,在广泛使用的图像数据集上,本文提出的算法在定量和定性测试中都达到了最先进的性能。
{"title":"See SIFT in a Rain: Divide-and-conquer SIFT Key Point Recovery from a Single Rainy Image","authors":"Ping Wang, Wei Wu, Zhu-jun Li, Yong Liu","doi":"10.1109/VCIP53242.2021.9675434","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675434","url":null,"abstract":"Scale-Invariant Feature Transform (SIFT) is one of the most well-known image matching methods, which has been widely applied in various visual fields. Because of the adoption of a difference of Gaussian (DoG) pyramid and Gaussian gradient information for extrema detection and description, respectively, SIFT achieves accurate key points and thus has shown excellent matching results but except under adverse weather conditions like rain. To address the issue, in the paper we propose a divide-and-conquer SIFT key points recovery algorithm from a single rainy image. In the proposed algorithm, we do not aim to improve quality for a derained image, but divide the key point recovery problem from a rainy image into two sub-problems, one being how to recover the DoG pyramid for the derained image and the other being how to recover the gradients of derained Gaussian images at multiple scales. We also propose two separate deep learning networks with different losses and structures to recover them, respectively. This divide-and-conquer scheme to set different objectives for SIFT extrema detection and description leads to very robust performance. Experimental results show that our proposed algorithm achieves state-of-the-art performances on widely used image datasets in both quantitative and qualitative tests.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131736110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2021 International Conference on Visual Communications and Image Processing (VCIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1