首页 > 最新文献

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)最新文献

英文 中文
Geodesic Disparity Compensation for Inter-View Prediction in VR180 VR180内视预测的测地线视差补偿
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301750
K. Sivakumar, B. Vishwanath, K. Rose
The VR180 format is gaining considerable traction among the various promising immersive multimedia formats that will arguably dominate future multimedia consumption applications. VR180 enables stereo viewing of a hemisphere about the user. The increased field of view and the stereo setting result in extensive volumes of data that strongly motivate the pursuit of novel efficient compression tools tailored to this format. This paper’s focus is on the critical inter-view prediction module that exploits correlations between camera views. Existing approaches mainly consist of projection to a plane where traditional multi-view coders are applied, and disparity compensation employs simple block translation in the plane. However, warping due to the projection renders such compensation highly suboptimal. The proposed approach circumvents this shortcoming by performing geodesic disparity compensation on the sphere. It leverages the observation that, as an observer moves from one view point to the other, all points on surrounding objects are perceived to move along respective geodesics on the sphere, which all intersect at the two points where the axis connecting the two view points pierces the sphere. Thus, the proposed method performs inter-view prediction on the sphere by moving pixels along their predefined respective geodesics, and accurately captures the perceived deformations. Experimental results show significant bitrate savings and evidence the efficacy of the proposed approach.
VR180格式在各种有前途的沉浸式多媒体格式中获得了相当大的吸引力,这些格式无疑将主导未来的多媒体消费应用。VR180支持对用户周围半球的立体观察。视野的扩大和立体设置产生了大量的数据,这强烈地激发了对针对这种格式量身定制的新型高效压缩工具的追求。本文的重点是利用摄像机视图之间的相关性的关键视图间预测模块。现有的方法主要是利用传统的多视点编码器在平面上进行投影,视差补偿采用平面内简单的块平移。然而,由于投影引起的翘曲使得这种补偿极不理想。该方法通过在球面上进行测地线视差补偿来克服这一缺点。它利用了这样的观察结果:当观察者从一个视点移动到另一个视点时,周围物体上的所有点都被感知到沿着球体上各自的测地线移动,这些测地线都在连接两个视点的轴穿过球体的两点相交。因此,该方法通过沿各自预定义的测地线移动像素对球体进行视间预测,并准确捕获感知到的变形。实验结果显示了显著的比特率节省,证明了该方法的有效性。
{"title":"Geodesic Disparity Compensation for Inter-View Prediction in VR180","authors":"K. Sivakumar, B. Vishwanath, K. Rose","doi":"10.1109/VCIP49819.2020.9301750","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301750","url":null,"abstract":"The VR180 format is gaining considerable traction among the various promising immersive multimedia formats that will arguably dominate future multimedia consumption applications. VR180 enables stereo viewing of a hemisphere about the user. The increased field of view and the stereo setting result in extensive volumes of data that strongly motivate the pursuit of novel efficient compression tools tailored to this format. This paper’s focus is on the critical inter-view prediction module that exploits correlations between camera views. Existing approaches mainly consist of projection to a plane where traditional multi-view coders are applied, and disparity compensation employs simple block translation in the plane. However, warping due to the projection renders such compensation highly suboptimal. The proposed approach circumvents this shortcoming by performing geodesic disparity compensation on the sphere. It leverages the observation that, as an observer moves from one view point to the other, all points on surrounding objects are perceived to move along respective geodesics on the sphere, which all intersect at the two points where the axis connecting the two view points pierces the sphere. Thus, the proposed method performs inter-view prediction on the sphere by moving pixels along their predefined respective geodesics, and accurately captures the perceived deformations. Experimental results show significant bitrate savings and evidence the efficacy of the proposed approach.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125053504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast Intra Coding Algorithm for Depth Map with End-to-End Edge Detection Network 基于端到端边缘检测网络的深度图快速编码算法
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301859
Chang Liu, Ke-bin Jia, Pengyu Liu
Compared with traditional High Efficiency Video Coding (HEVC), 3D-HEVC introduces multi-view coding and depth map coding, which leads to significant increase in coding complexity. In this paper, we propose a low complexity intra coding algorithm for depth map based on end-to-end edge detection network. Firstly, we use Holistically Nested Edge Detection (HED) network to determine the edge location of the depth map. Secondly, we use Ostu method to divide the output of the HED into foreground region and background region. Finally, the CU size and the candidate list of intra mode are determined according to the region of coding tree unit (CTU). Experimental results demonstrate that the proposed algorithm can reduce the encoding time by 39.56% on average under negligible degradation of coding performance.
与传统的高效视频编码(High Efficiency Video Coding, HEVC)相比,3D-HEVC引入了多视图编码和深度图编码,使得编码复杂度显著提高。本文提出了一种基于端到端边缘检测网络的低复杂度深度图内编码算法。首先,我们使用整体嵌套边缘检测(HED)网络确定深度图的边缘位置。其次,利用Ostu方法将HED的输出分割为前景区域和背景区域;最后,根据编码树单元(CTU)的区域确定CU大小和内模候选列表。实验结果表明,该算法在编码性能下降可以忽略不计的情况下,平均减少了39.56%的编码时间。
{"title":"Fast Intra Coding Algorithm for Depth Map with End-to-End Edge Detection Network","authors":"Chang Liu, Ke-bin Jia, Pengyu Liu","doi":"10.1109/VCIP49819.2020.9301859","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301859","url":null,"abstract":"Compared with traditional High Efficiency Video Coding (HEVC), 3D-HEVC introduces multi-view coding and depth map coding, which leads to significant increase in coding complexity. In this paper, we propose a low complexity intra coding algorithm for depth map based on end-to-end edge detection network. Firstly, we use Holistically Nested Edge Detection (HED) network to determine the edge location of the depth map. Secondly, we use Ostu method to divide the output of the HED into foreground region and background region. Finally, the CU size and the candidate list of intra mode are determined according to the region of coding tree unit (CTU). Experimental results demonstrate that the proposed algorithm can reduce the encoding time by 39.56% on average under negligible degradation of coding performance.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129103929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Recent Advances in End-to-End Learned Image and Video Compression 端到端学习图像和视频压缩的最新进展
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301753
Wen-Hsiao Peng, H. Hang
The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades.The deep learning technology recently developed may provide a new direction for constructing a high-compression image/video coding system. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of schemes (often trained end-to-end) may have good potential for further improving compression efficiency.In the first part of this tutorial, we shall (1) summarize briefly the progress of this topic in the past 3 or so years, including an overview of CLIC results and JPEG AI Call-for-Evidence Challenge on Learning-based Image Coding (issued in early 2020). Because Deep Neural Network (DNN)-based image compression is a new area, several techniques and structures have been tested. The recently published autoencoder-based schemes can achieve similar PSNR to BPG (Better Portable Graphics, H.265 still image standard) and has superior subject quality (e.g., MSSSIM), especially at the very low bit rates. In the second part, we shall (2) address the detailed design concepts of image compression algorithms using the autoencoder structure. In the third part, we shall switch gears to (3) explore the emerging area of DNN-based video compression. Recent publications in this area have indicated that end-to-end trained video compression can achieve comparable or superior rate-distortion performance to HEVC/H.265. The CLIC at CVPR 2020 also created for the first time a new track dedicated to P-frame coding.
基于dct的变换编码技术被国际标准(ISO JPEG、ITU H.261/264/265、ISO MPEG-2/4/H等)采用了近30年。尽管研究人员仍在试图通过微调其组件和参数来提高其效率,但在过去的20年里,其基本结构没有改变。近年来发展起来的深度学习技术为构建高压缩图像/视频编码系统提供了新的方向。最近的结果,特别是来自CVPR的学习图像压缩挑战(CLIC),表明这种新型方案(通常是端到端训练)可能具有进一步提高压缩效率的良好潜力。在本教程的第一部分中,我们将(1)简要总结过去3年左右该主题的进展,包括概述CLIC结果和JPEG AI基于学习的图像编码证据征集挑战(2020年初发布)。由于基于深度神经网络(Deep Neural Network, DNN)的图像压缩是一个新领域,因此已有几种技术和结构进行了测试。最近发布的基于自动编码器的方案可以实现与BPG(更好的便携式图形,H.265静止图像标准)相似的PSNR,并且具有优越的主体质量(例如,MSSSIM),特别是在非常低的比特率下。在第二部分中,我们将(2)讨论使用自编码器结构的图像压缩算法的详细设计概念。在第三部分中,我们将切换到(3)探索基于dnn的视频压缩的新兴领域。该领域的最新出版物表明,端到端训练视频压缩可以实现与HEVC/H.265相当或更高的率失真性能。CVPR 2020上的CLIC还首次创建了一个专门用于p帧编码的新轨道。
{"title":"Recent Advances in End-to-End Learned Image and Video Compression","authors":"Wen-Hsiao Peng, H. Hang","doi":"10.1109/VCIP49819.2020.9301753","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301753","url":null,"abstract":"The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades.The deep learning technology recently developed may provide a new direction for constructing a high-compression image/video coding system. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of schemes (often trained end-to-end) may have good potential for further improving compression efficiency.In the first part of this tutorial, we shall (1) summarize briefly the progress of this topic in the past 3 or so years, including an overview of CLIC results and JPEG AI Call-for-Evidence Challenge on Learning-based Image Coding (issued in early 2020). Because Deep Neural Network (DNN)-based image compression is a new area, several techniques and structures have been tested. The recently published autoencoder-based schemes can achieve similar PSNR to BPG (Better Portable Graphics, H.265 still image standard) and has superior subject quality (e.g., MSSSIM), especially at the very low bit rates. In the second part, we shall (2) address the detailed design concepts of image compression algorithms using the autoencoder structure. In the third part, we shall switch gears to (3) explore the emerging area of DNN-based video compression. Recent publications in this area have indicated that end-to-end trained video compression can achieve comparable or superior rate-distortion performance to HEVC/H.265. The CLIC at CVPR 2020 also created for the first time a new track dedicated to P-frame coding.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128720714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mining Larger Class Activation Map with Common Attribute Labels 使用公共属性标签挖掘更大的类激活图
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301872
Runtong Zhang, Fanman Meng, Hongliang Li, Q. Wu, K. Ngan
Class Activation Map (CAM) is the visualization of target regions generated from classification networks. However, classification network trained by class-level labels only has high responses to a few features of objects and thus the network cannot discriminate the whole target. We think that original labels used in classification tasks are not enough to describe all features of the objects. If we annotate more detailed labels like class-agnostic attribute labels for each image, the network may be able to mine larger CAM. Motivated by this idea, we propose and design common attribute labels, which are lower-level labels summarized from original image-level categories to describe more details of the target. Moreover, it should be emphasized that our proposed labels have good generalization on unknown categories since attributes (such as head, body, etc.) in some categories (such as dog, cat, etc.) are common and class-agnostic. That is why we call our proposed labels as common attribute labels, which are lower-level and more general compared with traditional labels. We finish the annotation work based on the PASCAL VOC2012 dataset and design a new architecture to successfully classify these common attribute labels. Then after fusing features of attribute labels into original categories, our network can mine larger CAMs of objects. Our method achieves better CAM results in visual and higher evaluation scores compared with traditional methods.
类激活图(Class Activation Map, CAM)是由分类网络生成的目标区域的可视化。然而,类级标签训练的分类网络仅对目标的少数特征有较高的响应,不能区分整个目标。我们认为分类任务中使用的原始标签不足以描述对象的所有特征。如果我们为每张图像标注更详细的标签,如类别无关属性标签,网络可能能够挖掘更大的CAM。在这一思想的推动下,我们提出并设计了通用属性标签,它是从原始图像级类别中总结出来的低级标签,用于描述目标的更多细节。此外,应该强调的是,我们提出的标签对未知类别有很好的泛化,因为某些类别(如狗、猫等)的属性(如头、身体等)是常见的,并且是类不可知论的。这就是为什么我们将建议的标签称为公共属性标签的原因,与传统标签相比,它级别更低,更通用。我们基于PASCAL VOC2012数据集完成了标注工作,并设计了一种新的架构来成功地对这些通用属性标签进行分类。然后将属性标签的特征融合到原始分类中,我们的网络可以挖掘出更大的物体的cam。与传统方法相比,我们的方法在视觉上取得了更好的CAM效果,并且获得了更高的评价分数。
{"title":"Mining Larger Class Activation Map with Common Attribute Labels","authors":"Runtong Zhang, Fanman Meng, Hongliang Li, Q. Wu, K. Ngan","doi":"10.1109/VCIP49819.2020.9301872","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301872","url":null,"abstract":"Class Activation Map (CAM) is the visualization of target regions generated from classification networks. However, classification network trained by class-level labels only has high responses to a few features of objects and thus the network cannot discriminate the whole target. We think that original labels used in classification tasks are not enough to describe all features of the objects. If we annotate more detailed labels like class-agnostic attribute labels for each image, the network may be able to mine larger CAM. Motivated by this idea, we propose and design common attribute labels, which are lower-level labels summarized from original image-level categories to describe more details of the target. Moreover, it should be emphasized that our proposed labels have good generalization on unknown categories since attributes (such as head, body, etc.) in some categories (such as dog, cat, etc.) are common and class-agnostic. That is why we call our proposed labels as common attribute labels, which are lower-level and more general compared with traditional labels. We finish the annotation work based on the PASCAL VOC2012 dataset and design a new architecture to successfully classify these common attribute labels. Then after fusing features of attribute labels into original categories, our network can mine larger CAMs of objects. Our method achieves better CAM results in visual and higher evaluation scores compared with traditional methods.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117080490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
IBC-Mirror Mode for Screen Content Coding for the Next Generation Video Coding Standards 下一代视频编码标准中屏幕内容编码的ibc镜像模式
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301811
Jian Cao, Zhen Qiu, Zhengren Li, Fan Liang, Jun Wang
This paper proposes an IBC-Mirror mode for Screen Content Coding (SCC) for the next generation video coding standards, including Versatile Video Coding (VVC) and Audio Video Standard-3 in China (AVS3). It is the first time to take mirror characteristic into consideration for SCC in VVC/AVS3. Based on the translational motion model of Intra Block Copy (IBC) mode, the function of "horizontal and vertical flipping" is further added to reduce prediction error and improve coding efficiency. The proposed IBC-Mirror mode is implemented on the latest reference software, including VTM5.0 (VVC) and HPM-5.0 (AVS3). The simulations show that the proposed mode can achieve up to 1~2% (VVC) and 4~7% (AVS3) BD-rate saving for SCC test sequences. Drafts about the mode have been submitted to AVS meeting and investigated in SCC Core Experiments (CE).
本文针对下一代视频编码标准,包括通用视频编码(VVC)和中国音视频标准3 (AVS3),提出了一种IBC-Mirror模式的屏幕内容编码(SCC)。这是VVC/AVS3中首次考虑到SCC的镜像特性。在Intra Block Copy (IBC)模式的平移运动模型的基础上,进一步增加了“水平和垂直翻转”功能,以减少预测误差,提高编码效率。IBC-Mirror模式是在最新的参考软件VTM5.0 (VVC)和HPM-5.0 (AVS3)上实现的。仿真结果表明,该方法对SCC测试序列可分别节省1~2% (VVC)和4~7% (AVS3)的bd速率。该模型的草案已提交给AVS会议,并在SCC核心实验(CE)中进行了研究。
{"title":"IBC-Mirror Mode for Screen Content Coding for the Next Generation Video Coding Standards","authors":"Jian Cao, Zhen Qiu, Zhengren Li, Fan Liang, Jun Wang","doi":"10.1109/VCIP49819.2020.9301811","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301811","url":null,"abstract":"This paper proposes an IBC-Mirror mode for Screen Content Coding (SCC) for the next generation video coding standards, including Versatile Video Coding (VVC) and Audio Video Standard-3 in China (AVS3). It is the first time to take mirror characteristic into consideration for SCC in VVC/AVS3. Based on the translational motion model of Intra Block Copy (IBC) mode, the function of \"horizontal and vertical flipping\" is further added to reduce prediction error and improve coding efficiency. The proposed IBC-Mirror mode is implemented on the latest reference software, including VTM5.0 (VVC) and HPM-5.0 (AVS3). The simulations show that the proposed mode can achieve up to 1~2% (VVC) and 4~7% (AVS3) BD-rate saving for SCC test sequences. Drafts about the mode have been submitted to AVS meeting and investigated in SCC Core Experiments (CE).","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121385043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DENESTO: A Tool for Video Decoding Energy Estimation and Visualization DENESTO:一个视频解码能量估计和可视化工具
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301877
Matthias Kränzler, Christian Herglotz, A. Kaup
In previous research, it is shown that the decoding energy demand of several video codecs can be estimated accurately by using bit stream feature-based models. Therefore, we show in this paper that the visualization with the Decoding Energy Estimation Tool (DENESTO) can help to improve the understanding of the energy demand of the decoder.
以往的研究表明,采用基于比特流特征的模型可以准确地估计出多种视频编解码器的解码能量需求。因此,我们在本文中表明,使用解码能量估计工具(DENESTO)的可视化可以帮助提高对解码器能量需求的理解。
{"title":"DENESTO: A Tool for Video Decoding Energy Estimation and Visualization","authors":"Matthias Kränzler, Christian Herglotz, A. Kaup","doi":"10.1109/VCIP49819.2020.9301877","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301877","url":null,"abstract":"In previous research, it is shown that the decoding energy demand of several video codecs can be estimated accurately by using bit stream feature-based models. Therefore, we show in this paper that the visualization with the Decoding Energy Estimation Tool (DENESTO) can help to improve the understanding of the energy demand of the decoder.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122240115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Robust Visual Tracking Via An Imbalance-Elimination Mechanism 基于不平衡消除机制的鲁棒视觉跟踪
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301862
Jin Feng, Kaili Zhao, Xiaolin Song, Anxin Li, Honggang Zhang
The competitive performances in visual tracking are achieved mostly by tracking-by-detection based approaches, whose accuracy highly relies on a binary classifier that distinguishes targets from distractors in a set of candidates. However, severe class imbalance, with few positives (e.g., targets) relative to negatives (e.g., backgrounds), leads to degrade accuracy of classification or increase bias of tracking. In this paper, we propose an imbalance-elimination mechanism, which adopts a multi-class paradigm and utilizes a novel candidate generation strategy. Specifically, our multi-class model assigns samples into one positive class and four proposed negative classes, naturally alleviating class imbalance. We define negative classes by introducing proportions of targets in samples, which values explicitly reveal relative scales between targets and backgrounds. Further-more, during candidate generation, we exploit such scale-aware negative patterns to help adjust searching areas of candidates to incorporate larger target proportions, thus more accurate target candidates are obtained and more positive samples are included to ease class imbalance simultaneously. Extensive experiments on standard benchmarks show that our tracker achieves favorable performance against the state-of-the-art approaches, and offers robust discrimination of positive targets and negative patterns.
视觉跟踪的竞争性能主要是通过基于检测的跟踪方法来实现的,其精度高度依赖于在一组候选对象中区分目标和干扰物的二值分类器。然而,严重的类不平衡,即正面信息(如目标)相对于负面信息(如背景)较少,会导致分类准确性的降低或跟踪偏差的增加。在本文中,我们提出了一种不平衡消除机制,该机制采用多类范式,并利用了一种新的候选生成策略。具体来说,我们的多类模型将样本分配到一个正类和四个拟负类中,自然地缓解了类的不平衡。我们通过引入样本中目标的比例来定义负类,这些值明确地揭示了目标和背景之间的相对尺度。此外,在候选对象生成过程中,我们利用这种尺度感知的负模式来帮助调整候选对象的搜索区域以包含更大的目标比例,从而获得更准确的目标候选对象,同时包含更多的阳性样本,以缓解类别不平衡。在标准基准上的大量实验表明,我们的跟踪器在最先进的方法下取得了良好的性能,并提供了积极目标和消极模式的强大区分。
{"title":"Robust Visual Tracking Via An Imbalance-Elimination Mechanism","authors":"Jin Feng, Kaili Zhao, Xiaolin Song, Anxin Li, Honggang Zhang","doi":"10.1109/VCIP49819.2020.9301862","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301862","url":null,"abstract":"The competitive performances in visual tracking are achieved mostly by tracking-by-detection based approaches, whose accuracy highly relies on a binary classifier that distinguishes targets from distractors in a set of candidates. However, severe class imbalance, with few positives (e.g., targets) relative to negatives (e.g., backgrounds), leads to degrade accuracy of classification or increase bias of tracking. In this paper, we propose an imbalance-elimination mechanism, which adopts a multi-class paradigm and utilizes a novel candidate generation strategy. Specifically, our multi-class model assigns samples into one positive class and four proposed negative classes, naturally alleviating class imbalance. We define negative classes by introducing proportions of targets in samples, which values explicitly reveal relative scales between targets and backgrounds. Further-more, during candidate generation, we exploit such scale-aware negative patterns to help adjust searching areas of candidates to incorporate larger target proportions, thus more accurate target candidates are obtained and more positive samples are included to ease class imbalance simultaneously. Extensive experiments on standard benchmarks show that our tracker achieves favorable performance against the state-of-the-art approaches, and offers robust discrimination of positive targets and negative patterns.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133877199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of data preprocessing modules in digital image forensics methods using deep learning 使用深度学习的数字图像取证方法中的数据预处理模块综述
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301880
Alexandre Berthet, J. Dugelay
Access to technologies like mobile phones contributes to the significant increase in the volume of digital visual data (images and videos). In addition, photo editing software is becoming increasingly powerful and easy to use. In some cases, these tools can be utilized to produce forgeries with the objective to change the semantic meaning of a photo or a video (e.g. fake news). Digital image forensics (DIF) includes two main objectives: the detection (and localization) of forgery and the identification of the origin of the acquisition (i.e. sensor identification). Since 2005, many classical methods for DIF have been designed, implemented and tested on several databases. Meantime, innovative approaches based on deep learning have emerged in other fields and have surpassed traditional techniques. In the context of DIF, deep learning methods mainly use convolutional neural networks (CNN) associated with significant preprocessing modules. This is an active domain and two possible ways to operate preprocessing have been studied: prior to the network or incorporated into it. None of the various studies on the digital image forensics provide a comprehensive overview of the preprocessing techniques used with deep learning methods. Therefore, the core objective of this article is to review the preprocessing modules associated with CNN models.
移动电话等技术的使用有助于数字视觉数据(图像和视频)量的显著增加。此外,照片编辑软件正变得越来越强大和易于使用。在某些情况下,这些工具可以用来制作伪造的目的是改变照片或视频的语义(例如假新闻)。数字图像取证(DIF)包括两个主要目标:伪造的检测(和定位)和采集来源的识别(即传感器识别)。自2005年以来,已经在多个数据库上设计、实现和测试了许多经典的DIF方法。与此同时,基于深度学习的创新方法已经在其他领域出现,并超越了传统技术。在DIF背景下,深度学习方法主要使用卷积神经网络(CNN)与重要的预处理模块相关联。这是一个活跃的领域,已经研究了两种可能的操作预处理方法:在网络之前或并入网络。关于数字图像取证的各种研究都没有提供与深度学习方法一起使用的预处理技术的全面概述。因此,本文的核心目标是回顾与CNN模型相关的预处理模块。
{"title":"A review of data preprocessing modules in digital image forensics methods using deep learning","authors":"Alexandre Berthet, J. Dugelay","doi":"10.1109/VCIP49819.2020.9301880","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301880","url":null,"abstract":"Access to technologies like mobile phones contributes to the significant increase in the volume of digital visual data (images and videos). In addition, photo editing software is becoming increasingly powerful and easy to use. In some cases, these tools can be utilized to produce forgeries with the objective to change the semantic meaning of a photo or a video (e.g. fake news). Digital image forensics (DIF) includes two main objectives: the detection (and localization) of forgery and the identification of the origin of the acquisition (i.e. sensor identification). Since 2005, many classical methods for DIF have been designed, implemented and tested on several databases. Meantime, innovative approaches based on deep learning have emerged in other fields and have surpassed traditional techniques. In the context of DIF, deep learning methods mainly use convolutional neural networks (CNN) associated with significant preprocessing modules. This is an active domain and two possible ways to operate preprocessing have been studied: prior to the network or incorporated into it. None of the various studies on the digital image forensics provide a comprehensive overview of the preprocessing techniques used with deep learning methods. Therefore, the core objective of this article is to review the preprocessing modules associated with CNN models.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115363011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Attention-Guided Fusion Network of Point Cloud and Multiple Views for 3D Shape Recognition 基于注意引导的点云和多视角三维形状识别融合网络
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301813
Bo Peng, Zengrui Yu, Jianjun Lei, Jiahui Song
With the dramatic growth of 3D shape data, 3D shape recognition has become a hot research topic in the field of computer vision. How to effectively utilize the multimodal characteristics of 3D shape has been one of the key problems to boost the performance of 3D shape recognition. In this paper, we propose a novel attention-guided fusion network of point cloud and multiple views for 3D shape recognition. Specifically, in order to obtain more discriminative descriptor for 3D shape data, the inter-modality attention enhancement module and view-context attention fusion module are proposed to gradually refine and fuse the features of the point cloud and multiple views. In the inter-modality attention enhancement module, the inter-modality attention mask based on the joint feature representation is computed, so that the features of each modality are enhanced by fusing the correlative information between two modalities. After that, the view-context attention fusion module is proposed to explore the context information of multiple views, and fuse the enhanced features to obtain more discriminative descriptor for 3D shape data. Experimental results on the ModelNet40 dataset demonstrate that the proposed method achieves promising performance compared with state-of-the-art methods.
随着三维形状数据的急剧增长,三维形状识别已成为计算机视觉领域的研究热点。如何有效地利用三维形状的多模态特征,是提高三维形状识别性能的关键问题之一。本文提出了一种新颖的点云和多视图的注意力引导融合网络,用于三维形状识别。具体而言,为了获得更具判别性的三维形状数据描述符,提出了模态间注意增强模块和视图-上下文注意融合模块,逐步细化和融合点云和多视图的特征。在模态间注意增强模块中,计算基于联合特征表示的模态间注意掩模,通过融合两模态间的相关信息对各模态特征进行增强。在此基础上,提出了视图-上下文注意融合模块,对多视图的上下文信息进行挖掘,并融合增强特征,获得更具判别性的三维形状数据描述符。在ModelNet40数据集上的实验结果表明,与现有方法相比,该方法取得了良好的性能。
{"title":"Attention-Guided Fusion Network of Point Cloud and Multiple Views for 3D Shape Recognition","authors":"Bo Peng, Zengrui Yu, Jianjun Lei, Jiahui Song","doi":"10.1109/VCIP49819.2020.9301813","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301813","url":null,"abstract":"With the dramatic growth of 3D shape data, 3D shape recognition has become a hot research topic in the field of computer vision. How to effectively utilize the multimodal characteristics of 3D shape has been one of the key problems to boost the performance of 3D shape recognition. In this paper, we propose a novel attention-guided fusion network of point cloud and multiple views for 3D shape recognition. Specifically, in order to obtain more discriminative descriptor for 3D shape data, the inter-modality attention enhancement module and view-context attention fusion module are proposed to gradually refine and fuse the features of the point cloud and multiple views. In the inter-modality attention enhancement module, the inter-modality attention mask based on the joint feature representation is computed, so that the features of each modality are enhanced by fusing the correlative information between two modalities. After that, the view-context attention fusion module is proposed to explore the context information of multiple views, and fuse the enhanced features to obtain more discriminative descriptor for 3D shape data. Experimental results on the ModelNet40 dataset demonstrate that the proposed method achieves promising performance compared with state-of-the-art methods.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121262717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Special Cane with Visual Odometry for Real-time Indoor Navigation of Blind People 盲人室内实时导航专用视觉里程计手杖
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301782
Tang Tang, Menghan Hu, Guodong Li, Qingli Li, Jian Zhang, Xiaofeng Zhou, Guangtao Zhai
Indoor navigation is urgently needed by blind people in their everyday lives. In this paper, we design an assistive cane with visual odometry based on actual requirements of the blind to aid them in attaining safe indoor navigation. Compared to the state-of-the-art indoor navigation systems, the proposed device is portable, compact, and adaptable. The main specifications of the system are: the perception range is respectively from 0.10m to 2.10m, and 0.08m to 1.60m for width and length dimensions; the maximum weight is 2.1kg; the detection range is from 0.15m and 3.00m; the cruising ability is about 8h; and the objects whose heights are below 80cm can be detected. The demo video of the proposed navigation system is available at: https://doi.org/10.6084/m9.figshare.12399572.v1.
盲人在日常生活中迫切需要室内导航。本文根据盲人的实际需求,设计了一种具有视觉里程计的辅助手杖,以帮助他们实现安全的室内导航。与最先进的室内导航系统相比,所提出的设备具有便携、紧凑和适应性强的特点。系统的主要规格为:感知范围分别为0.1 m ~ 2.1 m,宽度和长度尺寸分别为0.08m ~ 1.60m;最大重量2.1kg;探测距离为0.15m ~ 3.00m;巡航能力约8h;并且可以检测到高度在80cm以下的物体。所提出的导航系统的演示视频可在:https://doi.org/10.6084/m9.figshare.12399572.v1。
{"title":"Special Cane with Visual Odometry for Real-time Indoor Navigation of Blind People","authors":"Tang Tang, Menghan Hu, Guodong Li, Qingli Li, Jian Zhang, Xiaofeng Zhou, Guangtao Zhai","doi":"10.1109/VCIP49819.2020.9301782","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301782","url":null,"abstract":"Indoor navigation is urgently needed by blind people in their everyday lives. In this paper, we design an assistive cane with visual odometry based on actual requirements of the blind to aid them in attaining safe indoor navigation. Compared to the state-of-the-art indoor navigation systems, the proposed device is portable, compact, and adaptable. The main specifications of the system are: the perception range is respectively from 0.10m to 2.10m, and 0.08m to 1.60m for width and length dimensions; the maximum weight is 2.1kg; the detection range is from 0.15m and 3.00m; the cruising ability is about 8h; and the objects whose heights are below 80cm can be detected. The demo video of the proposed navigation system is available at: https://doi.org/10.6084/m9.figshare.12399572.v1.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116193491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1