首页 > 最新文献

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)最新文献

英文 中文
Geodesic Disparity Compensation for Inter-View Prediction in VR180 VR180内视预测的测地线视差补偿
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301750
K. Sivakumar, B. Vishwanath, K. Rose
The VR180 format is gaining considerable traction among the various promising immersive multimedia formats that will arguably dominate future multimedia consumption applications. VR180 enables stereo viewing of a hemisphere about the user. The increased field of view and the stereo setting result in extensive volumes of data that strongly motivate the pursuit of novel efficient compression tools tailored to this format. This paper’s focus is on the critical inter-view prediction module that exploits correlations between camera views. Existing approaches mainly consist of projection to a plane where traditional multi-view coders are applied, and disparity compensation employs simple block translation in the plane. However, warping due to the projection renders such compensation highly suboptimal. The proposed approach circumvents this shortcoming by performing geodesic disparity compensation on the sphere. It leverages the observation that, as an observer moves from one view point to the other, all points on surrounding objects are perceived to move along respective geodesics on the sphere, which all intersect at the two points where the axis connecting the two view points pierces the sphere. Thus, the proposed method performs inter-view prediction on the sphere by moving pixels along their predefined respective geodesics, and accurately captures the perceived deformations. Experimental results show significant bitrate savings and evidence the efficacy of the proposed approach.
VR180格式在各种有前途的沉浸式多媒体格式中获得了相当大的吸引力,这些格式无疑将主导未来的多媒体消费应用。VR180支持对用户周围半球的立体观察。视野的扩大和立体设置产生了大量的数据,这强烈地激发了对针对这种格式量身定制的新型高效压缩工具的追求。本文的重点是利用摄像机视图之间的相关性的关键视图间预测模块。现有的方法主要是利用传统的多视点编码器在平面上进行投影,视差补偿采用平面内简单的块平移。然而,由于投影引起的翘曲使得这种补偿极不理想。该方法通过在球面上进行测地线视差补偿来克服这一缺点。它利用了这样的观察结果:当观察者从一个视点移动到另一个视点时,周围物体上的所有点都被感知到沿着球体上各自的测地线移动,这些测地线都在连接两个视点的轴穿过球体的两点相交。因此,该方法通过沿各自预定义的测地线移动像素对球体进行视间预测,并准确捕获感知到的变形。实验结果显示了显著的比特率节省,证明了该方法的有效性。
{"title":"Geodesic Disparity Compensation for Inter-View Prediction in VR180","authors":"K. Sivakumar, B. Vishwanath, K. Rose","doi":"10.1109/VCIP49819.2020.9301750","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301750","url":null,"abstract":"The VR180 format is gaining considerable traction among the various promising immersive multimedia formats that will arguably dominate future multimedia consumption applications. VR180 enables stereo viewing of a hemisphere about the user. The increased field of view and the stereo setting result in extensive volumes of data that strongly motivate the pursuit of novel efficient compression tools tailored to this format. This paper’s focus is on the critical inter-view prediction module that exploits correlations between camera views. Existing approaches mainly consist of projection to a plane where traditional multi-view coders are applied, and disparity compensation employs simple block translation in the plane. However, warping due to the projection renders such compensation highly suboptimal. The proposed approach circumvents this shortcoming by performing geodesic disparity compensation on the sphere. It leverages the observation that, as an observer moves from one view point to the other, all points on surrounding objects are perceived to move along respective geodesics on the sphere, which all intersect at the two points where the axis connecting the two view points pierces the sphere. Thus, the proposed method performs inter-view prediction on the sphere by moving pixels along their predefined respective geodesics, and accurately captures the perceived deformations. Experimental results show significant bitrate savings and evidence the efficacy of the proposed approach.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125053504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast Intra Coding Algorithm for Depth Map with End-to-End Edge Detection Network 基于端到端边缘检测网络的深度图快速编码算法
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301859
Chang Liu, Ke-bin Jia, Pengyu Liu
Compared with traditional High Efficiency Video Coding (HEVC), 3D-HEVC introduces multi-view coding and depth map coding, which leads to significant increase in coding complexity. In this paper, we propose a low complexity intra coding algorithm for depth map based on end-to-end edge detection network. Firstly, we use Holistically Nested Edge Detection (HED) network to determine the edge location of the depth map. Secondly, we use Ostu method to divide the output of the HED into foreground region and background region. Finally, the CU size and the candidate list of intra mode are determined according to the region of coding tree unit (CTU). Experimental results demonstrate that the proposed algorithm can reduce the encoding time by 39.56% on average under negligible degradation of coding performance.
与传统的高效视频编码(High Efficiency Video Coding, HEVC)相比,3D-HEVC引入了多视图编码和深度图编码,使得编码复杂度显著提高。本文提出了一种基于端到端边缘检测网络的低复杂度深度图内编码算法。首先,我们使用整体嵌套边缘检测(HED)网络确定深度图的边缘位置。其次,利用Ostu方法将HED的输出分割为前景区域和背景区域;最后,根据编码树单元(CTU)的区域确定CU大小和内模候选列表。实验结果表明,该算法在编码性能下降可以忽略不计的情况下,平均减少了39.56%的编码时间。
{"title":"Fast Intra Coding Algorithm for Depth Map with End-to-End Edge Detection Network","authors":"Chang Liu, Ke-bin Jia, Pengyu Liu","doi":"10.1109/VCIP49819.2020.9301859","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301859","url":null,"abstract":"Compared with traditional High Efficiency Video Coding (HEVC), 3D-HEVC introduces multi-view coding and depth map coding, which leads to significant increase in coding complexity. In this paper, we propose a low complexity intra coding algorithm for depth map based on end-to-end edge detection network. Firstly, we use Holistically Nested Edge Detection (HED) network to determine the edge location of the depth map. Secondly, we use Ostu method to divide the output of the HED into foreground region and background region. Finally, the CU size and the candidate list of intra mode are determined according to the region of coding tree unit (CTU). Experimental results demonstrate that the proposed algorithm can reduce the encoding time by 39.56% on average under negligible degradation of coding performance.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129103929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Recent Advances in End-to-End Learned Image and Video Compression 端到端学习图像和视频压缩的最新进展
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301753
Wen-Hsiao Peng, H. Hang
The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades.The deep learning technology recently developed may provide a new direction for constructing a high-compression image/video coding system. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of schemes (often trained end-to-end) may have good potential for further improving compression efficiency.In the first part of this tutorial, we shall (1) summarize briefly the progress of this topic in the past 3 or so years, including an overview of CLIC results and JPEG AI Call-for-Evidence Challenge on Learning-based Image Coding (issued in early 2020). Because Deep Neural Network (DNN)-based image compression is a new area, several techniques and structures have been tested. The recently published autoencoder-based schemes can achieve similar PSNR to BPG (Better Portable Graphics, H.265 still image standard) and has superior subject quality (e.g., MSSSIM), especially at the very low bit rates. In the second part, we shall (2) address the detailed design concepts of image compression algorithms using the autoencoder structure. In the third part, we shall switch gears to (3) explore the emerging area of DNN-based video compression. Recent publications in this area have indicated that end-to-end trained video compression can achieve comparable or superior rate-distortion performance to HEVC/H.265. The CLIC at CVPR 2020 also created for the first time a new track dedicated to P-frame coding.
基于dct的变换编码技术被国际标准(ISO JPEG、ITU H.261/264/265、ISO MPEG-2/4/H等)采用了近30年。尽管研究人员仍在试图通过微调其组件和参数来提高其效率,但在过去的20年里,其基本结构没有改变。近年来发展起来的深度学习技术为构建高压缩图像/视频编码系统提供了新的方向。最近的结果,特别是来自CVPR的学习图像压缩挑战(CLIC),表明这种新型方案(通常是端到端训练)可能具有进一步提高压缩效率的良好潜力。在本教程的第一部分中,我们将(1)简要总结过去3年左右该主题的进展,包括概述CLIC结果和JPEG AI基于学习的图像编码证据征集挑战(2020年初发布)。由于基于深度神经网络(Deep Neural Network, DNN)的图像压缩是一个新领域,因此已有几种技术和结构进行了测试。最近发布的基于自动编码器的方案可以实现与BPG(更好的便携式图形,H.265静止图像标准)相似的PSNR,并且具有优越的主体质量(例如,MSSSIM),特别是在非常低的比特率下。在第二部分中,我们将(2)讨论使用自编码器结构的图像压缩算法的详细设计概念。在第三部分中,我们将切换到(3)探索基于dnn的视频压缩的新兴领域。该领域的最新出版物表明,端到端训练视频压缩可以实现与HEVC/H.265相当或更高的率失真性能。CVPR 2020上的CLIC还首次创建了一个专门用于p帧编码的新轨道。
{"title":"Recent Advances in End-to-End Learned Image and Video Compression","authors":"Wen-Hsiao Peng, H. Hang","doi":"10.1109/VCIP49819.2020.9301753","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301753","url":null,"abstract":"The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades.The deep learning technology recently developed may provide a new direction for constructing a high-compression image/video coding system. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of schemes (often trained end-to-end) may have good potential for further improving compression efficiency.In the first part of this tutorial, we shall (1) summarize briefly the progress of this topic in the past 3 or so years, including an overview of CLIC results and JPEG AI Call-for-Evidence Challenge on Learning-based Image Coding (issued in early 2020). Because Deep Neural Network (DNN)-based image compression is a new area, several techniques and structures have been tested. The recently published autoencoder-based schemes can achieve similar PSNR to BPG (Better Portable Graphics, H.265 still image standard) and has superior subject quality (e.g., MSSSIM), especially at the very low bit rates. In the second part, we shall (2) address the detailed design concepts of image compression algorithms using the autoencoder structure. In the third part, we shall switch gears to (3) explore the emerging area of DNN-based video compression. Recent publications in this area have indicated that end-to-end trained video compression can achieve comparable or superior rate-distortion performance to HEVC/H.265. The CLIC at CVPR 2020 also created for the first time a new track dedicated to P-frame coding.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128720714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mining Larger Class Activation Map with Common Attribute Labels 使用公共属性标签挖掘更大的类激活图
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301872
Runtong Zhang, Fanman Meng, Hongliang Li, Q. Wu, K. Ngan
Class Activation Map (CAM) is the visualization of target regions generated from classification networks. However, classification network trained by class-level labels only has high responses to a few features of objects and thus the network cannot discriminate the whole target. We think that original labels used in classification tasks are not enough to describe all features of the objects. If we annotate more detailed labels like class-agnostic attribute labels for each image, the network may be able to mine larger CAM. Motivated by this idea, we propose and design common attribute labels, which are lower-level labels summarized from original image-level categories to describe more details of the target. Moreover, it should be emphasized that our proposed labels have good generalization on unknown categories since attributes (such as head, body, etc.) in some categories (such as dog, cat, etc.) are common and class-agnostic. That is why we call our proposed labels as common attribute labels, which are lower-level and more general compared with traditional labels. We finish the annotation work based on the PASCAL VOC2012 dataset and design a new architecture to successfully classify these common attribute labels. Then after fusing features of attribute labels into original categories, our network can mine larger CAMs of objects. Our method achieves better CAM results in visual and higher evaluation scores compared with traditional methods.
类激活图(Class Activation Map, CAM)是由分类网络生成的目标区域的可视化。然而,类级标签训练的分类网络仅对目标的少数特征有较高的响应,不能区分整个目标。我们认为分类任务中使用的原始标签不足以描述对象的所有特征。如果我们为每张图像标注更详细的标签,如类别无关属性标签,网络可能能够挖掘更大的CAM。在这一思想的推动下,我们提出并设计了通用属性标签,它是从原始图像级类别中总结出来的低级标签,用于描述目标的更多细节。此外,应该强调的是,我们提出的标签对未知类别有很好的泛化,因为某些类别(如狗、猫等)的属性(如头、身体等)是常见的,并且是类不可知论的。这就是为什么我们将建议的标签称为公共属性标签的原因,与传统标签相比,它级别更低,更通用。我们基于PASCAL VOC2012数据集完成了标注工作,并设计了一种新的架构来成功地对这些通用属性标签进行分类。然后将属性标签的特征融合到原始分类中,我们的网络可以挖掘出更大的物体的cam。与传统方法相比,我们的方法在视觉上取得了更好的CAM效果,并且获得了更高的评价分数。
{"title":"Mining Larger Class Activation Map with Common Attribute Labels","authors":"Runtong Zhang, Fanman Meng, Hongliang Li, Q. Wu, K. Ngan","doi":"10.1109/VCIP49819.2020.9301872","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301872","url":null,"abstract":"Class Activation Map (CAM) is the visualization of target regions generated from classification networks. However, classification network trained by class-level labels only has high responses to a few features of objects and thus the network cannot discriminate the whole target. We think that original labels used in classification tasks are not enough to describe all features of the objects. If we annotate more detailed labels like class-agnostic attribute labels for each image, the network may be able to mine larger CAM. Motivated by this idea, we propose and design common attribute labels, which are lower-level labels summarized from original image-level categories to describe more details of the target. Moreover, it should be emphasized that our proposed labels have good generalization on unknown categories since attributes (such as head, body, etc.) in some categories (such as dog, cat, etc.) are common and class-agnostic. That is why we call our proposed labels as common attribute labels, which are lower-level and more general compared with traditional labels. We finish the annotation work based on the PASCAL VOC2012 dataset and design a new architecture to successfully classify these common attribute labels. Then after fusing features of attribute labels into original categories, our network can mine larger CAMs of objects. Our method achieves better CAM results in visual and higher evaluation scores compared with traditional methods.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117080490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
IBC-Mirror Mode for Screen Content Coding for the Next Generation Video Coding Standards 下一代视频编码标准中屏幕内容编码的ibc镜像模式
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301811
Jian Cao, Zhen Qiu, Zhengren Li, Fan Liang, Jun Wang
This paper proposes an IBC-Mirror mode for Screen Content Coding (SCC) for the next generation video coding standards, including Versatile Video Coding (VVC) and Audio Video Standard-3 in China (AVS3). It is the first time to take mirror characteristic into consideration for SCC in VVC/AVS3. Based on the translational motion model of Intra Block Copy (IBC) mode, the function of "horizontal and vertical flipping" is further added to reduce prediction error and improve coding efficiency. The proposed IBC-Mirror mode is implemented on the latest reference software, including VTM5.0 (VVC) and HPM-5.0 (AVS3). The simulations show that the proposed mode can achieve up to 1~2% (VVC) and 4~7% (AVS3) BD-rate saving for SCC test sequences. Drafts about the mode have been submitted to AVS meeting and investigated in SCC Core Experiments (CE).
本文针对下一代视频编码标准,包括通用视频编码(VVC)和中国音视频标准3 (AVS3),提出了一种IBC-Mirror模式的屏幕内容编码(SCC)。这是VVC/AVS3中首次考虑到SCC的镜像特性。在Intra Block Copy (IBC)模式的平移运动模型的基础上,进一步增加了“水平和垂直翻转”功能,以减少预测误差,提高编码效率。IBC-Mirror模式是在最新的参考软件VTM5.0 (VVC)和HPM-5.0 (AVS3)上实现的。仿真结果表明,该方法对SCC测试序列可分别节省1~2% (VVC)和4~7% (AVS3)的bd速率。该模型的草案已提交给AVS会议,并在SCC核心实验(CE)中进行了研究。
{"title":"IBC-Mirror Mode for Screen Content Coding for the Next Generation Video Coding Standards","authors":"Jian Cao, Zhen Qiu, Zhengren Li, Fan Liang, Jun Wang","doi":"10.1109/VCIP49819.2020.9301811","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301811","url":null,"abstract":"This paper proposes an IBC-Mirror mode for Screen Content Coding (SCC) for the next generation video coding standards, including Versatile Video Coding (VVC) and Audio Video Standard-3 in China (AVS3). It is the first time to take mirror characteristic into consideration for SCC in VVC/AVS3. Based on the translational motion model of Intra Block Copy (IBC) mode, the function of \"horizontal and vertical flipping\" is further added to reduce prediction error and improve coding efficiency. The proposed IBC-Mirror mode is implemented on the latest reference software, including VTM5.0 (VVC) and HPM-5.0 (AVS3). The simulations show that the proposed mode can achieve up to 1~2% (VVC) and 4~7% (AVS3) BD-rate saving for SCC test sequences. Drafts about the mode have been submitted to AVS meeting and investigated in SCC Core Experiments (CE).","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121385043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DENESTO: A Tool for Video Decoding Energy Estimation and Visualization DENESTO:一个视频解码能量估计和可视化工具
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301877
Matthias Kränzler, Christian Herglotz, A. Kaup
In previous research, it is shown that the decoding energy demand of several video codecs can be estimated accurately by using bit stream feature-based models. Therefore, we show in this paper that the visualization with the Decoding Energy Estimation Tool (DENESTO) can help to improve the understanding of the energy demand of the decoder.
以往的研究表明,采用基于比特流特征的模型可以准确地估计出多种视频编解码器的解码能量需求。因此,我们在本文中表明,使用解码能量估计工具(DENESTO)的可视化可以帮助提高对解码器能量需求的理解。
{"title":"DENESTO: A Tool for Video Decoding Energy Estimation and Visualization","authors":"Matthias Kränzler, Christian Herglotz, A. Kaup","doi":"10.1109/VCIP49819.2020.9301877","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301877","url":null,"abstract":"In previous research, it is shown that the decoding energy demand of several video codecs can be estimated accurately by using bit stream feature-based models. Therefore, we show in this paper that the visualization with the Decoding Energy Estimation Tool (DENESTO) can help to improve the understanding of the energy demand of the decoder.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122240115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Robust Visual Tracking Via An Imbalance-Elimination Mechanism 基于不平衡消除机制的鲁棒视觉跟踪
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301862
Jin Feng, Kaili Zhao, Xiaolin Song, Anxin Li, Honggang Zhang
The competitive performances in visual tracking are achieved mostly by tracking-by-detection based approaches, whose accuracy highly relies on a binary classifier that distinguishes targets from distractors in a set of candidates. However, severe class imbalance, with few positives (e.g., targets) relative to negatives (e.g., backgrounds), leads to degrade accuracy of classification or increase bias of tracking. In this paper, we propose an imbalance-elimination mechanism, which adopts a multi-class paradigm and utilizes a novel candidate generation strategy. Specifically, our multi-class model assigns samples into one positive class and four proposed negative classes, naturally alleviating class imbalance. We define negative classes by introducing proportions of targets in samples, which values explicitly reveal relative scales between targets and backgrounds. Further-more, during candidate generation, we exploit such scale-aware negative patterns to help adjust searching areas of candidates to incorporate larger target proportions, thus more accurate target candidates are obtained and more positive samples are included to ease class imbalance simultaneously. Extensive experiments on standard benchmarks show that our tracker achieves favorable performance against the state-of-the-art approaches, and offers robust discrimination of positive targets and negative patterns.
视觉跟踪的竞争性能主要是通过基于检测的跟踪方法来实现的,其精度高度依赖于在一组候选对象中区分目标和干扰物的二值分类器。然而,严重的类不平衡,即正面信息(如目标)相对于负面信息(如背景)较少,会导致分类准确性的降低或跟踪偏差的增加。在本文中,我们提出了一种不平衡消除机制,该机制采用多类范式,并利用了一种新的候选生成策略。具体来说,我们的多类模型将样本分配到一个正类和四个拟负类中,自然地缓解了类的不平衡。我们通过引入样本中目标的比例来定义负类,这些值明确地揭示了目标和背景之间的相对尺度。此外,在候选对象生成过程中,我们利用这种尺度感知的负模式来帮助调整候选对象的搜索区域以包含更大的目标比例,从而获得更准确的目标候选对象,同时包含更多的阳性样本,以缓解类别不平衡。在标准基准上的大量实验表明,我们的跟踪器在最先进的方法下取得了良好的性能,并提供了积极目标和消极模式的强大区分。
{"title":"Robust Visual Tracking Via An Imbalance-Elimination Mechanism","authors":"Jin Feng, Kaili Zhao, Xiaolin Song, Anxin Li, Honggang Zhang","doi":"10.1109/VCIP49819.2020.9301862","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301862","url":null,"abstract":"The competitive performances in visual tracking are achieved mostly by tracking-by-detection based approaches, whose accuracy highly relies on a binary classifier that distinguishes targets from distractors in a set of candidates. However, severe class imbalance, with few positives (e.g., targets) relative to negatives (e.g., backgrounds), leads to degrade accuracy of classification or increase bias of tracking. In this paper, we propose an imbalance-elimination mechanism, which adopts a multi-class paradigm and utilizes a novel candidate generation strategy. Specifically, our multi-class model assigns samples into one positive class and four proposed negative classes, naturally alleviating class imbalance. We define negative classes by introducing proportions of targets in samples, which values explicitly reveal relative scales between targets and backgrounds. Further-more, during candidate generation, we exploit such scale-aware negative patterns to help adjust searching areas of candidates to incorporate larger target proportions, thus more accurate target candidates are obtained and more positive samples are included to ease class imbalance simultaneously. Extensive experiments on standard benchmarks show that our tracker achieves favorable performance against the state-of-the-art approaches, and offers robust discrimination of positive targets and negative patterns.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133877199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random-access-aware Light Field Video Coding using Tree Pruning Method 基于树修剪方法的随机访问感知光场视频编码
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301800
T. N. Huu, V. V. Duong, B. Jeon
The increasing prevalence of VR/AR as well as the expected availability of Light Field (LF) display soon call for more practical methods to transmit LF image/video for services. In that aspect, the LF video coding should not only consider the compression efficiency but also the view random-access capability (especially in the multi-view-based system). The multi-view coding system heavily exploits view dependencies coming from both inter-view and temporal correlation. While such a system greatly improves the compression efficiency, its view random-access capability can be much reduced due to so called "chain of dependencies." In this paper, we first model the chain of dependencies by a tree, then a cost function is used to assign an importance value to each tree node. By travelling from top to bottom, a node of lesser importance is cut-off, forming a pruned tree to achieve reduction of random-access complexity. Our tree pruning method has shown to reduce about 40% of random-access complexity at the cost of minor compression loss compared to the state-of-the-art methods. Furthermore, it is expected that our method is very lightweight in its realization and also effective on a practical LF video coding system.
随着VR/AR技术的日益普及,以及光场(LF)显示技术的日益普及,迫切需要更实用的方法来传输用于服务的LF图像/视频。在这方面,LF视频编码不仅要考虑压缩效率,而且要考虑视图随机访问能力(特别是在基于多视图的系统中)。多视图编码系统大量利用来自视图间相关性和时间相关性的视图依赖。虽然这样的系统大大提高了压缩效率,但由于所谓的“依赖链”,其视图随机访问能力可能会大大降低。在本文中,我们首先用树来建模依赖链,然后使用代价函数为每个树节点分配一个重要值。通过从上到下的移动,一个不太重要的节点被切断,形成一个修剪树,以达到降低随机访问复杂性的目的。与最先进的方法相比,我们的树修剪方法以较小的压缩损失为代价,减少了大约40%的随机访问复杂性。此外,我们的方法在实现上是非常轻量级的,并且在实际的低频视频编码系统中也是有效的。
{"title":"Random-access-aware Light Field Video Coding using Tree Pruning Method","authors":"T. N. Huu, V. V. Duong, B. Jeon","doi":"10.1109/VCIP49819.2020.9301800","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301800","url":null,"abstract":"The increasing prevalence of VR/AR as well as the expected availability of Light Field (LF) display soon call for more practical methods to transmit LF image/video for services. In that aspect, the LF video coding should not only consider the compression efficiency but also the view random-access capability (especially in the multi-view-based system). The multi-view coding system heavily exploits view dependencies coming from both inter-view and temporal correlation. While such a system greatly improves the compression efficiency, its view random-access capability can be much reduced due to so called \"chain of dependencies.\" In this paper, we first model the chain of dependencies by a tree, then a cost function is used to assign an importance value to each tree node. By travelling from top to bottom, a node of lesser importance is cut-off, forming a pruned tree to achieve reduction of random-access complexity. Our tree pruning method has shown to reduce about 40% of random-access complexity at the cost of minor compression loss compared to the state-of-the-art methods. Furthermore, it is expected that our method is very lightweight in its realization and also effective on a practical LF video coding system.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117140852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Unified Single Image De-raining Model via Region Adaptive Coupled Network 基于区域自适应耦合网络的统一单幅图像去训练模型
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301865
Q. Wu, Li Chen, K. Ngan, Hongliang Li, Fanman Meng, Linfeng Xu
Single image de-raining is quite challenging due to the diversity of rain types and inhomogeneous distributions of rainwater. By means of dedicated models and constraints, existing methods perform well for specific rain type. However, their generalization capability is highly limited as well. In this paper, we propose a unified de-raining model by selectively fusing the clean background of the input rain image and the well restored regions occluded by various rains. This is achieved by our region adaptive coupled network (RACN), whose two branches integrate the features of each other in different layers to jointly generate the spatial-variant weight and restored image respectively. On the one hand, the weight branch could lead the restoration branch to focus on the regions with higher contributions for de-raining. On the other hand, the restoration branch could guide the weight branch to keep off the regions with over-/under-filtering risks. Extensive experiments show that our method outperforms many state-of-the-art de-raining algorithms on diverse rain types including the rain streak, raindrop and rain-mist.
由于降雨类型的多样性和雨水分布的不均匀性,单图像去雨是相当具有挑战性的。利用专用的模型和约束条件,现有的方法对特定的降雨类型表现良好。然而,它们的泛化能力也非常有限。在本文中,我们提出了一种统一的去雨模型,该模型通过选择性地融合输入降雨图像的干净背景和被各种降雨遮挡的恢复良好的区域。这是通过我们的区域自适应耦合网络(RACN)实现的,该网络的两个分支在不同的层中整合彼此的特征,分别共同生成空间变权和恢复图像。一方面,权重分支可以引导恢复分支关注对降水贡献较大的区域;另一方面,恢复分支可以引导权重分支避开有过/过滤风险的区域。大量的实验表明,我们的方法在不同的雨类型(包括雨带、雨滴和雨雾)上优于许多最先进的去雨算法。
{"title":"A Unified Single Image De-raining Model via Region Adaptive Coupled Network","authors":"Q. Wu, Li Chen, K. Ngan, Hongliang Li, Fanman Meng, Linfeng Xu","doi":"10.1109/VCIP49819.2020.9301865","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301865","url":null,"abstract":"Single image de-raining is quite challenging due to the diversity of rain types and inhomogeneous distributions of rainwater. By means of dedicated models and constraints, existing methods perform well for specific rain type. However, their generalization capability is highly limited as well. In this paper, we propose a unified de-raining model by selectively fusing the clean background of the input rain image and the well restored regions occluded by various rains. This is achieved by our region adaptive coupled network (RACN), whose two branches integrate the features of each other in different layers to jointly generate the spatial-variant weight and restored image respectively. On the one hand, the weight branch could lead the restoration branch to focus on the regions with higher contributions for de-raining. On the other hand, the restoration branch could guide the weight branch to keep off the regions with over-/under-filtering risks. Extensive experiments show that our method outperforms many state-of-the-art de-raining algorithms on diverse rain types including the rain streak, raindrop and rain-mist.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123490851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Lightweight Color Image Demosaicking with Multi-Core Feature Extraction 轻量级彩色图像去马赛克与多核特征提取
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301841
Yufei Tan, Kan Chang, Hengxin Li, Zhenhua Tang, Tuanfa Qin
Convolutional neural network (CNN)-based color image demosaicking methods have achieved great success recently. However, in many applications where the computation resource is highly limited, it is not practical to deploy large-scale networks. This paper proposes a lightweight CNN for color image demosaicking. Firstly, to effectively extract shallow features, a multi-core feature extraction module, which takes the Bayer sampling positions into consideration, is proposed. Secondly, by taking advantage of inter-channel correlation, an attention-aware fusion module is presented to efficiently r econstruct t he full color image. Moreover, a feature enhancement module, which contains several cascading attention-aware enhancement blocks, is designed to further refine t he i nitial reconstructed i mage. To demonstrate the effectiveness of the proposed network, several state-of-the-art demosaicking methods are compared. Experimental results show that with the smallest number of parameters, the proposed network outperforms the other compared methods in terms of both objective and subjective qualities.
基于卷积神经网络(CNN)的彩色图像去马赛克方法近年来取得了很大的成功。然而,在许多计算资源非常有限的应用中,部署大规模网络是不现实的。提出了一种用于彩色图像去马赛克的轻量级CNN算法。首先,为了有效提取浅层特征,提出了考虑拜耳采样位置的多核特征提取模块;其次,利用通道间相关性,提出了一种注意感知融合模块,实现了对全彩图像的有效重构;此外,设计了一个特征增强模块,该模块包含多个级联的注意感知增强块,对初始重构图像进行进一步细化。为了证明所提出的网络的有效性,比较了几种最先进的去马赛克方法。实验结果表明,在参数数量最少的情况下,本文提出的网络在客观质量和主观质量方面都优于其他比较方法。
{"title":"Lightweight Color Image Demosaicking with Multi-Core Feature Extraction","authors":"Yufei Tan, Kan Chang, Hengxin Li, Zhenhua Tang, Tuanfa Qin","doi":"10.1109/VCIP49819.2020.9301841","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301841","url":null,"abstract":"Convolutional neural network (CNN)-based color image demosaicking methods have achieved great success recently. However, in many applications where the computation resource is highly limited, it is not practical to deploy large-scale networks. This paper proposes a lightweight CNN for color image demosaicking. Firstly, to effectively extract shallow features, a multi-core feature extraction module, which takes the Bayer sampling positions into consideration, is proposed. Secondly, by taking advantage of inter-channel correlation, an attention-aware fusion module is presented to efficiently r econstruct t he full color image. Moreover, a feature enhancement module, which contains several cascading attention-aware enhancement blocks, is designed to further refine t he i nitial reconstructed i mage. To demonstrate the effectiveness of the proposed network, several state-of-the-art demosaicking methods are compared. Experimental results show that with the smallest number of parameters, the proposed network outperforms the other compared methods in terms of both objective and subjective qualities.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"64 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123443993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1