首页 > 最新文献

2020 IEEE International Conference on Image Processing (ICIP)最新文献

英文 中文
Video-Based Coding Of Volumetric Data 基于视频的体积数据编码
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190689
D. Graziosi, B. Kroon
New standards are emerging for the coding of volumetric 3D data such as immersive video and point clouds. Some of these volumetric encoders similarly utilize video codecs as the core of their compression approach, but apply different techniques to convert volumetric 3D data into 2D content for subsequent 2D video compression. Currently in MPEG there are two activities that follow this paradigm: ISO/IEC 23090-5 Video-based Point Cloud Compression (V-PCC) and ISO/IEC 23090-12 MPEG Immersive Video (MIV). In this article we propose for both standards to define 2D projection as common transmission format. We then describe a procedure based on camera projections that is applicable to both standards to convert 3D information into 2D images for efficient 2D compression. Results show that our approach successfully encodes both point clouds and immersive video content with the same performance as the current test models that MPEG experts developed separately for the respective standards. We conclude the article by discussing further integration steps and future directions.
三维数据编码的新标准正在出现,比如沉浸式视频和点云。其中一些体积编码器类似地利用视频编解码器作为其压缩方法的核心,但应用不同的技术将体积3D数据转换为2D内容,以便随后进行2D视频压缩。目前在MPEG中有两种活动遵循这种范式:ISO/IEC 23090-5基于视频的点云压缩(V-PCC)和ISO/IEC 23090-12 MPEG沉浸式视频(MIV)。在本文中,我们建议这两个标准都将二维投影定义为通用的传输格式。然后,我们描述了一个基于相机投影的程序,该程序适用于两种标准,将3D信息转换为2D图像,以实现有效的2D压缩。结果表明,我们的方法成功地对点云和沉浸式视频内容进行了编码,其性能与MPEG专家为各自标准单独开发的当前测试模型相同。我们通过讨论进一步的集成步骤和未来的方向来结束本文。
{"title":"Video-Based Coding Of Volumetric Data","authors":"D. Graziosi, B. Kroon","doi":"10.1109/ICIP40778.2020.9190689","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190689","url":null,"abstract":"New standards are emerging for the coding of volumetric 3D data such as immersive video and point clouds. Some of these volumetric encoders similarly utilize video codecs as the core of their compression approach, but apply different techniques to convert volumetric 3D data into 2D content for subsequent 2D video compression. Currently in MPEG there are two activities that follow this paradigm: ISO/IEC 23090-5 Video-based Point Cloud Compression (V-PCC) and ISO/IEC 23090-12 MPEG Immersive Video (MIV). In this article we propose for both standards to define 2D projection as common transmission format. We then describe a procedure based on camera projections that is applicable to both standards to convert 3D information into 2D images for efficient 2D compression. Results show that our approach successfully encodes both point clouds and immersive video content with the same performance as the current test models that MPEG experts developed separately for the respective standards. We conclude the article by discussing further integration steps and future directions.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114972387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Improved Intra Coding Beyond AV1 Using Adaptive Prediction Angles and Reference Lines 使用自适应预测角度和参考线改进AV1以外的内部编码
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191279
Liang Zhao, Xin Zhao, Shan Liu
A fixed set of intra prediction angles by using the reconstructed samples in adjacent reference line are employed in AV1 to remove the spatial redundancy of video signals. Two methods are proposed in this paper to further improve the intra coding performance of AV1. Firsly, to better signal the intra prediction modes, only a subset of the intra prediction modes (IPMs) are allowed and signaled for each block, which is adaptively selected according to the IPMs of neighboring blocks. Secondly, to reduce the prediction errors when there is a strong discontinuity between the samples in current block and its adjacent reference samples, an adaptive reference line selection method is proposed by enabling farther reference lines for intra prediction. Experimental results show that, the proposed methods achieve 2.2% luma BD-rate savings with around 150% encoding time for intra coding on top of the libaom implementation of AV1.
在AV1中,利用相邻参考线上重构样本的一组固定的内预测角来消除视频信号的空间冗余。为了进一步提高AV1的帧内编码性能,本文提出了两种方法。首先,为了更好地对内部预测模式进行信号化,每个块只允许对内部预测模式的一个子集进行信号化,该子集根据相邻块的ipm自适应选择。其次,为了减小当前块样本与其相邻参考样本之间存在较强不连续时的预测误差,提出了一种自适应参考线选择方法,使参考线更远,从而实现块内预测。实验结果表明,在AV1自由实现的基础上,本文提出的方法实现了2.2%的luma bd率和150%左右的编码时间。
{"title":"Improved Intra Coding Beyond AV1 Using Adaptive Prediction Angles and Reference Lines","authors":"Liang Zhao, Xin Zhao, Shan Liu","doi":"10.1109/ICIP40778.2020.9191279","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191279","url":null,"abstract":"A fixed set of intra prediction angles by using the reconstructed samples in adjacent reference line are employed in AV1 to remove the spatial redundancy of video signals. Two methods are proposed in this paper to further improve the intra coding performance of AV1. Firsly, to better signal the intra prediction modes, only a subset of the intra prediction modes (IPMs) are allowed and signaled for each block, which is adaptively selected according to the IPMs of neighboring blocks. Secondly, to reduce the prediction errors when there is a strong discontinuity between the samples in current block and its adjacent reference samples, an adaptive reference line selection method is proposed by enabling farther reference lines for intra prediction. Experimental results show that, the proposed methods achieve 2.2% luma BD-rate savings with around 150% encoding time for intra coding on top of the libaom implementation of AV1.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116683439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Cross-Modal Variational Framework For Food Image Analysis 食品图像分析的跨模态变分框架
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190758
T. Theodoridis, V. Solachidis, K. Dimitropoulos, P. Daras
Food analysis resides at the core of modern nutrition recommender systems, providing the foundation for a high-level understanding of users’ eating habits. This paper focuses on the sub-task of ingredient recognition from food images using a variational framework. The framework consists of two variational encoder-decoder branches, aimed at processing information from different modalities (images and text), as well as a variational mapper branch, which accomplishes the task of aligning the distributions of the individual branches. Experimental results on the Yummly-28K data-set showcase that the proposed framework performs better than similar variational frameworks, while it surpasses current state-of-the-art approaches on the large-scale Recipe1M data-set.
食物分析是现代营养推荐系统的核心,为高层次地了解用户的饮食习惯提供了基础。本文主要研究了基于变分框架的食品图像成分识别子任务。该框架包括两个变分编码器-解码器分支,旨在处理来自不同模式(图像和文本)的信息,以及一个变分映射器分支,完成对齐各个分支分布的任务。yumly - 28k数据集上的实验结果表明,所提出的框架比类似的变分框架性能更好,同时在大规模Recipe1M数据集上超越了当前最先进的方法。
{"title":"A Cross-Modal Variational Framework For Food Image Analysis","authors":"T. Theodoridis, V. Solachidis, K. Dimitropoulos, P. Daras","doi":"10.1109/ICIP40778.2020.9190758","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190758","url":null,"abstract":"Food analysis resides at the core of modern nutrition recommender systems, providing the foundation for a high-level understanding of users’ eating habits. This paper focuses on the sub-task of ingredient recognition from food images using a variational framework. The framework consists of two variational encoder-decoder branches, aimed at processing information from different modalities (images and text), as well as a variational mapper branch, which accomplishes the task of aligning the distributions of the individual branches. Experimental results on the Yummly-28K data-set showcase that the proposed framework performs better than similar variational frameworks, while it surpasses current state-of-the-art approaches on the large-scale Recipe1M data-set.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116956258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Optimization Of Motion Compensation Based On GPU And CPU For VVC Decoding 基于GPU和CPU的VVC解码运动补偿优化
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190708
Xu Han, Shanshe Wang, Siwei Ma, Wen Gao
To achieve higher compression efficiency, the new developing video coding standard Versatile Video Coding(VVC) introduced a large amount of new coding technologies, which increases the computational complexity of the decoder significantly. Among these technologies, the inter prediction methods, including affine motion compensation and decoder side motion vector refinement(DMVR), make inter prediction become the most time consuming module and bring new challenges for real-time decoding. In this paper, we proposed an efficient GPU-based motion compensation scheme to speedup the decoding. Through re-partition of coding unit(CU) according to the data dependency and different thread organization methods for different situation, the computational resources of GPU are utilized efficiently. Experiments on NVIDIA GeForce RTX 2080Ti GPU showed the motion compensation can be done in 5ms for Ultra HD 4K, which means the decoding speed is accelerated by 16 times compared to the VVC reference software on CPU.
为了获得更高的压缩效率,新开发的视频编码标准VVC (Versatile video coding)引入了大量新的编码技术,这大大增加了解码器的计算复杂度。其中,仿射运动补偿和解码器侧运动矢量细化(DMVR)等互预测方法使得互预测成为耗时最多的模块,给实时解码带来了新的挑战。在本文中,我们提出了一种高效的基于gpu的运动补偿方案来加快解码速度。通过根据数据依赖性对编码单元(CU)进行重新划分,并针对不同情况采用不同的线程组织方法,有效地利用了GPU的计算资源。在NVIDIA GeForce RTX 2080Ti GPU上的实验表明,超高清4K的运动补偿可以在5ms内完成,这意味着与CPU上的VVC参考软件相比,解码速度提高了16倍。
{"title":"Optimization Of Motion Compensation Based On GPU And CPU For VVC Decoding","authors":"Xu Han, Shanshe Wang, Siwei Ma, Wen Gao","doi":"10.1109/ICIP40778.2020.9190708","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190708","url":null,"abstract":"To achieve higher compression efficiency, the new developing video coding standard Versatile Video Coding(VVC) introduced a large amount of new coding technologies, which increases the computational complexity of the decoder significantly. Among these technologies, the inter prediction methods, including affine motion compensation and decoder side motion vector refinement(DMVR), make inter prediction become the most time consuming module and bring new challenges for real-time decoding. In this paper, we proposed an efficient GPU-based motion compensation scheme to speedup the decoding. Through re-partition of coding unit(CU) according to the data dependency and different thread organization methods for different situation, the computational resources of GPU are utilized efficiently. Experiments on NVIDIA GeForce RTX 2080Ti GPU showed the motion compensation can be done in 5ms for Ultra HD 4K, which means the decoding speed is accelerated by 16 times compared to the VVC reference software on CPU.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120973594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Residual Networks Based Distortion Classification and Ranking for Laparoscopic Image Quality Assessment 基于残差网络的腹腔镜图像质量评价失真分类与排序
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191111
Zohaib Amjad Khan, Azeddine Beghdadi, M. Kaaniche, F. A. Cheikh
Laparoscopic images and videos are often affected by different types of distortion like noise, smoke, blur and nonuniform illumination. Automatic detection of these distortions, followed generally by application of appropriate image quality enhancement methods, is critical to avoid errors during surgery. In this context, a crucial step involves an objective assessment of the image quality, which is a two-fold problem requiring both the classification of the distortion type affecting the image and the estimation of the severity level of that distortion. Unlike existing image quality measures which focus mainly on estimating a quality score, we propose in this paper to formulate the image quality assessment task as a multi-label classification problem taking into account both the type as well as the severity level (or rank) of distortions. Here, this problem is then solved by resorting to a deep neural networks based approach. The obtained results on a laparoscopic image dataset show the efficiency of the proposed approach.
腹腔镜图像和视频经常受到不同类型的失真的影响,如噪音、烟雾、模糊和不均匀的照明。自动检测这些畸变,然后通常应用适当的图像质量增强方法,对于避免手术过程中的错误至关重要。在这种情况下,关键的一步涉及到对图像质量的客观评估,这是一个双重问题,既需要对影响图像的失真类型进行分类,又需要对该失真的严重程度进行估计。与现有的主要关注于估计质量分数的图像质量度量不同,我们在本文中提出将图像质量评估任务制定为考虑失真类型和严重程度(或等级)的多标签分类问题。在这里,这个问题可以通过基于深度神经网络的方法来解决。在腹腔镜图像数据集上得到的结果表明了该方法的有效性。
{"title":"Residual Networks Based Distortion Classification and Ranking for Laparoscopic Image Quality Assessment","authors":"Zohaib Amjad Khan, Azeddine Beghdadi, M. Kaaniche, F. A. Cheikh","doi":"10.1109/ICIP40778.2020.9191111","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191111","url":null,"abstract":"Laparoscopic images and videos are often affected by different types of distortion like noise, smoke, blur and nonuniform illumination. Automatic detection of these distortions, followed generally by application of appropriate image quality enhancement methods, is critical to avoid errors during surgery. In this context, a crucial step involves an objective assessment of the image quality, which is a two-fold problem requiring both the classification of the distortion type affecting the image and the estimation of the severity level of that distortion. Unlike existing image quality measures which focus mainly on estimating a quality score, we propose in this paper to formulate the image quality assessment task as a multi-label classification problem taking into account both the type as well as the severity level (or rank) of distortions. Here, this problem is then solved by resorting to a deep neural networks based approach. The obtained results on a laparoscopic image dataset show the efficiency of the proposed approach.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127365903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Parallax Motion Effect Generation Through Instance Segmentation And Depth Estimation 基于实例分割和深度估计的视差运动效果生成
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191168
A. Pinto, Manuel Alberto Cordova Neira, L. G. L. Decker, J. L. Flores-Campana, M. R. Souza, A. Santos, Jhonatas S. Conceição, H. F. Gagliardi, D. Luvizon, R. Torres, H. Pedrini
Stereo vision is a growing topic in computer vision due to the innumerable opportunities and applications this technology offers for the development of modern solutions, such as virtual and augmented reality applications. To enhance the user’s experience in three-dimensional virtual environments, the motion parallax estimation is a promising technique to achieve this objective. In this paper, we propose an algorithm for generating parallax motion effects from a single image, taking advantage of state-of-the-art instance segmentation and depth estimation approaches. This work also presents a comparison against such algorithms to investigate the trade-off between efficiency and quality of the parallax motion effects, taking into consideration a multi-task learning network capable of estimating instance segmentation and depth estimation at once. Experimental results and visual quality assessment indicate that the PyD-Net network (depth estimation) combined with Mask R-CNN or FBNet networks (instance segmentation) can produce parallax motion effects with good visual quality.
立体视觉是计算机视觉中一个日益增长的话题,因为该技术为现代解决方案的开发提供了无数的机会和应用,例如虚拟和增强现实应用。为了增强用户在三维虚拟环境中的体验,运动视差估计是实现这一目标的一种很有前途的技术。在本文中,我们提出了一种利用最先进的实例分割和深度估计方法从单个图像生成视差运动效果的算法。本研究还对这些算法进行了比较,以研究视差运动效果的效率和质量之间的权衡,并考虑了能够同时估计实例分割和深度估计的多任务学习网络。实验结果和视觉质量评估表明,PyD-Net网络(深度估计)与Mask R-CNN或FBNet网络(实例分割)相结合可以产生视差运动效果,且视觉质量良好。
{"title":"Parallax Motion Effect Generation Through Instance Segmentation And Depth Estimation","authors":"A. Pinto, Manuel Alberto Cordova Neira, L. G. L. Decker, J. L. Flores-Campana, M. R. Souza, A. Santos, Jhonatas S. Conceição, H. F. Gagliardi, D. Luvizon, R. Torres, H. Pedrini","doi":"10.1109/ICIP40778.2020.9191168","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191168","url":null,"abstract":"Stereo vision is a growing topic in computer vision due to the innumerable opportunities and applications this technology offers for the development of modern solutions, such as virtual and augmented reality applications. To enhance the user’s experience in three-dimensional virtual environments, the motion parallax estimation is a promising technique to achieve this objective. In this paper, we propose an algorithm for generating parallax motion effects from a single image, taking advantage of state-of-the-art instance segmentation and depth estimation approaches. This work also presents a comparison against such algorithms to investigate the trade-off between efficiency and quality of the parallax motion effects, taking into consideration a multi-task learning network capable of estimating instance segmentation and depth estimation at once. Experimental results and visual quality assessment indicate that the PyD-Net network (depth estimation) combined with Mask R-CNN or FBNet networks (instance segmentation) can produce parallax motion effects with good visual quality.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127434208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Online Learning for Beta-Liouville Hidden Markov Models: Incremental Variational Learning for Video Surveillance and Action Recognition Beta-Liouville隐马尔可夫模型的在线学习:视频监控和动作识别的增量变分学习
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191144
Samr Ali, N. Bouguila
Challenges in realtime installation of surveillance systems is an active area of research, especially with the use of adaptable machine learning techniques. In this paper, we propose the use of variational learning of Beta-Liouville (BL) hidden Markov models (HMM) for AR in an online setup. This proposed incremental framework enables continuous adjustment of the system for better modelling. We evaluate the proposed model on the visible IOSB dataset to validate the framework.
实时安装监控系统的挑战是一个活跃的研究领域,特别是使用适应性机器学习技术。在本文中,我们提出将变分学习的Beta-Liouville (BL)隐马尔可夫模型(HMM)用于在线AR设置。这个建议的增量框架使系统能够不断调整,以更好地建模。我们在可见的IOSB数据集上评估了所提出的模型以验证该框架。
{"title":"Online Learning for Beta-Liouville Hidden Markov Models: Incremental Variational Learning for Video Surveillance and Action Recognition","authors":"Samr Ali, N. Bouguila","doi":"10.1109/ICIP40778.2020.9191144","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191144","url":null,"abstract":"Challenges in realtime installation of surveillance systems is an active area of research, especially with the use of adaptable machine learning techniques. In this paper, we propose the use of variational learning of Beta-Liouville (BL) hidden Markov models (HMM) for AR in an online setup. This proposed incremental framework enables continuous adjustment of the system for better modelling. We evaluate the proposed model on the visible IOSB dataset to validate the framework.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124824291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Prediction-Decision Network For Video Object Tracking 视频目标跟踪预测决策网络
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191145
Yasheng Sun, Tao He, Ying-hong Peng, Jin Qi, Jie Hu
In this paper, we introduce an approach for visual tracking in videos that predicts the bounding box location of a target object at every frame. This tracking problem is formulated as a sequential decision-making process where both historical and current information are taken into account to decide the correct object location. We develop a deep reinforcement learning based strategy, via which the target object position is predicted and decided in a unified framework. Specifically, a RNN based prediction network is developed where local features and global features are fused together to predict object movement. Together with the predicted movement, some predefined possible offsets and detection results form into an action space. A decision network is trained in a reinforcement manner to learn to select the most reasonable tracking box from the action space, through which the target object is tracked at each frame. Experiments in an existing tracking benchmark demonstrate the effectiveness and robustness of our proposed strategy.
在本文中,我们介绍了一种视频视觉跟踪的方法,该方法在每一帧预测目标物体的边界框位置。这种跟踪问题被表述为一个连续的决策过程,其中考虑了历史和当前信息来确定正确的目标位置。我们开发了一种基于深度强化学习的策略,通过该策略在统一的框架中预测和决定目标物体的位置。具体来说,开发了一种基于RNN的预测网络,将局部特征和全局特征融合在一起来预测物体的运动。与预测的运动一起,一些预定义的可能偏移和检测结果形成一个动作空间。以强化的方式训练决策网络,学习从动作空间中选择最合理的跟踪框,每帧跟踪目标对象。在现有跟踪基准上的实验证明了我们所提出的策略的有效性和鲁棒性。
{"title":"Prediction-Decision Network For Video Object Tracking","authors":"Yasheng Sun, Tao He, Ying-hong Peng, Jin Qi, Jie Hu","doi":"10.1109/ICIP40778.2020.9191145","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191145","url":null,"abstract":"In this paper, we introduce an approach for visual tracking in videos that predicts the bounding box location of a target object at every frame. This tracking problem is formulated as a sequential decision-making process where both historical and current information are taken into account to decide the correct object location. We develop a deep reinforcement learning based strategy, via which the target object position is predicted and decided in a unified framework. Specifically, a RNN based prediction network is developed where local features and global features are fused together to predict object movement. Together with the predicted movement, some predefined possible offsets and detection results form into an action space. A decision network is trained in a reinforcement manner to learn to select the most reasonable tracking box from the action space, through which the target object is tracked at each frame. Experiments in an existing tracking benchmark demonstrate the effectiveness and robustness of our proposed strategy.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125075095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Spatio-Angular Binary Descriptor For Fast Light Field Inter View Matching 一种用于快速光场视间匹配的空间-角度二元描述子
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191118
Martin Alain, A. Smolic
Light fields are able to capture light rays from a scene arriving at different angles, effectively creating multiple perspective views of the same scene. Thus, one of the flagship applications of light fields is to estimate the captured scene geometry, which can notably be achieved by establishing correspondences between the perspective views, usually in the form of a disparity map. Such correspondence estimation has been a long standing research topic in computer vision, with application to stereo vision or optical flow. Research in this area has shown the importance of well designed descriptors to enable fast and accurate matching. We propose in this paper a binary descriptor exploiting the light field gradient over both the spatial and the angular dimensions in order to improve inter view matching. We demonstrate in a disparity estimation application that it can achieve comparable accuracy compared to existing descriptors while being faster to compute.
光场能够捕获从不同角度到达的场景的光线,有效地创建同一场景的多个透视图。因此,光场的主要应用之一是估计捕获的场景几何形状,这可以通过在透视图之间建立对应关系来实现,通常以视差图的形式。这种对应估计一直是计算机视觉领域的研究课题,在立体视觉和光流等领域都有应用。该领域的研究表明,设计良好的描述符对于实现快速准确的匹配非常重要。本文提出了一种利用光场在空间和角度两个维度上的梯度的二元描述子,以改善视间匹配。我们在一个视差估计应用程序中证明,与现有的描述符相比,它可以达到相当的精度,同时计算速度更快。
{"title":"A Spatio-Angular Binary Descriptor For Fast Light Field Inter View Matching","authors":"Martin Alain, A. Smolic","doi":"10.1109/ICIP40778.2020.9191118","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191118","url":null,"abstract":"Light fields are able to capture light rays from a scene arriving at different angles, effectively creating multiple perspective views of the same scene. Thus, one of the flagship applications of light fields is to estimate the captured scene geometry, which can notably be achieved by establishing correspondences between the perspective views, usually in the form of a disparity map. Such correspondence estimation has been a long standing research topic in computer vision, with application to stereo vision or optical flow. Research in this area has shown the importance of well designed descriptors to enable fast and accurate matching. We propose in this paper a binary descriptor exploiting the light field gradient over both the spatial and the angular dimensions in order to improve inter view matching. We demonstrate in a disparity estimation application that it can achieve comparable accuracy compared to existing descriptors while being faster to compute.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126164837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
CDVA/VCM: Language for Intelligent and Autonomous Vehicles CDVA/VCM:智能和自动驾驶汽车语言
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190735
Baohua Sun, Hao Sha, M. Rafie, Lin Yang
Intelligent transportation is a complex system that involves the interaction of connected technologies, including Smart Sensors, Intelligent and Autonomous Vehicles, High Precision Maps, and 5G. The coordination of all these machines mandates a common language that serves as a protocol for intelligent machines to communicate. International standards serves as the global protocol to satisfy industry needs at the product level. MPEG-CDVA is the official ISO standard for search and retrieval applications by providing Compact Descriptors for Video Analysis (CDVA). It is robust and enables efficient implementations on embedded systems. CDVA is the first generation language for images/videos. MPEG-VCM is developing advanced features beyond CDVA to the new generation as Video Coding for Machines (VCM). With the wide availability of low-power AI chips, CDVA and VCM are ready to deploy and serve as the language for intelligent and autonomous vehicles. In this paper, we demonstrate the use of the SuperCDVA and Closed Captioning CDVA algorithms for intelligent and autonomous vehicles. Concepts are borrowed from the Super Characters algorithm in Natural Language Processing. In order for intelligent and autonomous vehicles to understand events on the road, the CDVA vectors are organized into an image to represent the story of the video.
智能交通是一个复杂的系统,涉及智能传感器、智能和自动驾驶汽车、高精度地图和5G等互联技术的相互作用。所有这些机器的协调需要一种共同的语言,作为智能机器通信的协议。国际标准作为全球协议,在产品层面满足行业需求。MPEG-CDVA通过为视频分析(CDVA)提供紧凑描述符,是搜索和检索应用程序的官方ISO标准。它是健壮的,能够在嵌入式系统上实现高效。CDVA是第一代图像/视频语言。MPEG-VCM正在开发超越CDVA的新一代视频编码技术(VCM)。随着低功耗人工智能芯片的广泛应用,CDVA和VCM已经准备好部署并作为智能和自动驾驶汽车的语言。在本文中,我们演示了在智能和自动驾驶汽车中使用SuperCDVA和Closed Captioning CDVA算法。概念借用自自然语言处理中的超级字符算法。为了让智能和自动驾驶车辆理解道路上的事件,CDVA向量被组织成一个图像来代表视频的故事。
{"title":"CDVA/VCM: Language for Intelligent and Autonomous Vehicles","authors":"Baohua Sun, Hao Sha, M. Rafie, Lin Yang","doi":"10.1109/ICIP40778.2020.9190735","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190735","url":null,"abstract":"Intelligent transportation is a complex system that involves the interaction of connected technologies, including Smart Sensors, Intelligent and Autonomous Vehicles, High Precision Maps, and 5G. The coordination of all these machines mandates a common language that serves as a protocol for intelligent machines to communicate. International standards serves as the global protocol to satisfy industry needs at the product level. MPEG-CDVA is the official ISO standard for search and retrieval applications by providing Compact Descriptors for Video Analysis (CDVA). It is robust and enables efficient implementations on embedded systems. CDVA is the first generation language for images/videos. MPEG-VCM is developing advanced features beyond CDVA to the new generation as Video Coding for Machines (VCM). With the wide availability of low-power AI chips, CDVA and VCM are ready to deploy and serve as the language for intelligent and autonomous vehicles. In this paper, we demonstrate the use of the SuperCDVA and Closed Captioning CDVA algorithms for intelligent and autonomous vehicles. Concepts are borrowed from the Super Characters algorithm in Natural Language Processing. In order for intelligent and autonomous vehicles to understand events on the road, the CDVA vectors are organized into an image to represent the story of the video.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123288000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2020 IEEE International Conference on Image Processing (ICIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1