首页 > 最新文献

2020 IEEE International Conference on Image Processing (ICIP)最新文献

英文 中文
Calibrank: Effective Lidar-Camera Extrinsic Calibration By Multi-Modal Learning To Rank Calibrank:通过多模态学习排序的有效激光雷达相机外部校准
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190991
Xiannong Wu, Chi Zhang, Yuehu Liu
Precise and online LiDAR-camera extrinsic calibration is one of the prerequisites of multi-modal data fusion for autonomous perception. The existing 6-DoF pose regression networks take majority effort on coarse-to-fine training strategy to gradually approach the global minimum. However, with limited computing resources, the optimal pose parameters seem unreachable. Moreover, recent research on neural network interpretability reveals that learning-based pose regression is nothing but the interpolation with most relevant samples. Motivated by this notion, we propose to solve the calibration problem in a retrieval way. Concretely, the learning-to-rank pipeline is introduced for ranking the top n relevant poses in the gallery set, which is then fused in to the final prediction. To better explore the pose relevance between ground truth samples, we further propose an exponential mapping from parametric space to the relevance space. The superiority of the proposed method is validated and demonstrated in the comparative and ablative experimental analysis.
精确、在线的激光雷达相机外部标定是实现自主感知多模态数据融合的先决条件之一。现有的六自由度姿态回归网络主要采用粗到精的训练策略,逐步逼近全局最小值。然而,在有限的计算资源下,最优姿态参数似乎无法实现。此外,最近对神经网络可解释性的研究表明,基于学习的姿态回归只不过是对最相关样本的插值。基于这一思想,我们提出用检索的方式来解决校准问题。具体来说,引入了学习排序管道,用于对图库集中的前n个相关姿势进行排序,然后将其融合到最终的预测中。为了更好地探索地面真值样本之间的姿态相关性,我们进一步提出了从参数空间到相关空间的指数映射。对比和烧蚀实验分析验证了该方法的优越性。
{"title":"Calibrank: Effective Lidar-Camera Extrinsic Calibration By Multi-Modal Learning To Rank","authors":"Xiannong Wu, Chi Zhang, Yuehu Liu","doi":"10.1109/ICIP40778.2020.9190991","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190991","url":null,"abstract":"Precise and online LiDAR-camera extrinsic calibration is one of the prerequisites of multi-modal data fusion for autonomous perception. The existing 6-DoF pose regression networks take majority effort on coarse-to-fine training strategy to gradually approach the global minimum. However, with limited computing resources, the optimal pose parameters seem unreachable. Moreover, recent research on neural network interpretability reveals that learning-based pose regression is nothing but the interpolation with most relevant samples. Motivated by this notion, we propose to solve the calibration problem in a retrieval way. Concretely, the learning-to-rank pipeline is introduced for ranking the top n relevant poses in the gallery set, which is then fused in to the final prediction. To better explore the pose relevance between ground truth samples, we further propose an exponential mapping from parametric space to the relevance space. The superiority of the proposed method is validated and demonstrated in the comparative and ablative experimental analysis.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122123412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Two-Step Progressive Intra Prediction For Versatile Video Coding 多用途视频编码的两步进步式内预测
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190915
Meng Lei, Falei Luo, Xinfeng Zhang, Shanshe Wang, Siwei Ma
In traditional intra prediction, nearest reference samples are utilized to generate the prediction block. Although more directional intra modes and reference lines have been utilized, encoders could not predict complex content with only the 10-cal reference samples efficiently. To address this issue, a twostep progressive prediction method combining local and nonlocal information is proposed. The non-local information can be obtained through template matching based prediction, and the local information can be derived by the high frequency coefficients from the first prediction step. Experimental results show that the proposed method can achieve 0.87% BD-rate reduction in VTM-7.0. In particular, the method is of significant advantages over prediction schemes using only non-local information.
在传统的内预测中,利用最近的参考样本来生成预测块。虽然使用了更多的定向内模式和参考线,但仅使用10 cal参考样本,编码器无法有效地预测复杂内容。针对这一问题,提出了一种结合局部和非局部信息的两步渐进预测方法。非局部信息通过基于模板匹配的预测得到,局部信息通过第一步预测的高频系数得到。实验结果表明,该方法在VTM-7.0中可将bd率降低0.87%。特别是,与仅使用非局部信息的预测方案相比,该方法具有显著的优势。
{"title":"Two-Step Progressive Intra Prediction For Versatile Video Coding","authors":"Meng Lei, Falei Luo, Xinfeng Zhang, Shanshe Wang, Siwei Ma","doi":"10.1109/ICIP40778.2020.9190915","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190915","url":null,"abstract":"In traditional intra prediction, nearest reference samples are utilized to generate the prediction block. Although more directional intra modes and reference lines have been utilized, encoders could not predict complex content with only the 10-cal reference samples efficiently. To address this issue, a twostep progressive prediction method combining local and nonlocal information is proposed. The non-local information can be obtained through template matching based prediction, and the local information can be derived by the high frequency coefficients from the first prediction step. Experimental results show that the proposed method can achieve 0.87% BD-rate reduction in VTM-7.0. In particular, the method is of significant advantages over prediction schemes using only non-local information.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117087037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Supervised Multi-View Distributed Hashing 监督多视图分布式哈希
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191343
Yunpeng Tang, Xiaobo Shen, Zexuan Ji, Tao Wang, Peng Fu, Quansen Sun
Multi-view hashing efficiently integrates multi-view data for learning compact hash codes, and achieves impressive large-scale retrieval performance. In real-world applications, multi-view data are often stored or collected in different locations, where hash code learning is more challenging yet less studied. To fulfill this gap, this paper proposes a novel supervised multi-view distributed hashing (SMvDisH) for hash code learning from multi-view data in a distributed manner. SMvDisH yields the discriminative latent hash codes by joint learning of latent factor model and classifier. With local consistency assumption among neighbor nodes, the distributed learning problem is divided into a set of decentralized sub-problems. The sub-problems can be solved in parallel, and the computational and communication costs are low. Experimental results on three large-scale image datasets demonstrate that SMvDisH achieves competitive retrieval performance and trains faster than state-of-the-art multi-view hashing methods.
多视图哈希有效地集成了多视图数据来学习紧凑的哈希码,并取得了令人印象深刻的大规模检索性能。在实际应用程序中,多视图数据通常存储或收集在不同的位置,其中哈希码学习更具挑战性,但研究较少。为了弥补这一缺陷,本文提出了一种新的监督多视图分布式哈希算法(SMvDisH),用于以分布式方式从多视图数据中学习哈希码。SMvDisH通过对潜因子模型和分类器的联合学习产生判别潜哈希码。采用局部一致性假设,将分布式学习问题分解为一组分散的子问题。子问题可以并行求解,且计算和通信成本低。在三个大规模图像数据集上的实验结果表明,SMvDisH获得了具有竞争力的检索性能,并且比最先进的多视图哈希方法训练速度更快。
{"title":"Supervised Multi-View Distributed Hashing","authors":"Yunpeng Tang, Xiaobo Shen, Zexuan Ji, Tao Wang, Peng Fu, Quansen Sun","doi":"10.1109/ICIP40778.2020.9191343","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191343","url":null,"abstract":"Multi-view hashing efficiently integrates multi-view data for learning compact hash codes, and achieves impressive large-scale retrieval performance. In real-world applications, multi-view data are often stored or collected in different locations, where hash code learning is more challenging yet less studied. To fulfill this gap, this paper proposes a novel supervised multi-view distributed hashing (SMvDisH) for hash code learning from multi-view data in a distributed manner. SMvDisH yields the discriminative latent hash codes by joint learning of latent factor model and classifier. With local consistency assumption among neighbor nodes, the distributed learning problem is divided into a set of decentralized sub-problems. The sub-problems can be solved in parallel, and the computational and communication costs are low. Experimental results on three large-scale image datasets demonstrate that SMvDisH achieves competitive retrieval performance and trains faster than state-of-the-art multi-view hashing methods.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123901144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Virtual Reference Frame Generation For Multiview Video Coding 多视点视频编码的深度虚拟参考帧生成
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191112
Jianjun Lei, Zongqian Zhang, Dong Liu, Ying Chen, N. Ling
Multiview video has a large amount of data which brings great challenges to both the storage and transmission. Thus, it is essential to increase the compression efficiency of multiview video coding. In this paper, a deep virtual reference frame generation method is proposed to improve the performance of multiview video coding. Specifically, a parallax-guided generation network (PGG-Net) is designed to transform the parallax relation between different viewpoints and generate a high-quality virtual reference frame. In the network, a multilevel receptive field module is designed to enlarge the receptive field and extract the multi-scale deep features. After that, a parallax attention fusion module is used to transform the parallax and merge the features. The proposed method is integrated into the platform of 3D-HEVC and the generated virtual reference frame is inserted into the reference picture list as an additional reference. Experimental results show that the proposed method achieves 5.31% average BD-rate reduction compared to the 3D-HEVC.
多视点视频的数据量很大,这给存储和传输带来了很大的挑战。因此,提高多视点视频编码的压缩效率至关重要。为了提高多视点视频编码的性能,本文提出了一种深度虚拟参考帧生成方法。具体来说,设计了视差引导生成网络(PGG-Net)来转换不同视点之间的视差关系,生成高质量的虚拟参照系。在神经网络中,设计了多级感受野模块来扩大感受野并提取多尺度深度特征。然后利用视差注意力融合模块对视差进行变换,并对特征进行融合。将该方法集成到3D-HEVC平台中,生成的虚拟参考帧作为附加参考帧插入到参考图像列表中。实验结果表明,与3D-HEVC相比,该方法的平均bd率降低了5.31%。
{"title":"Deep Virtual Reference Frame Generation For Multiview Video Coding","authors":"Jianjun Lei, Zongqian Zhang, Dong Liu, Ying Chen, N. Ling","doi":"10.1109/ICIP40778.2020.9191112","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191112","url":null,"abstract":"Multiview video has a large amount of data which brings great challenges to both the storage and transmission. Thus, it is essential to increase the compression efficiency of multiview video coding. In this paper, a deep virtual reference frame generation method is proposed to improve the performance of multiview video coding. Specifically, a parallax-guided generation network (PGG-Net) is designed to transform the parallax relation between different viewpoints and generate a high-quality virtual reference frame. In the network, a multilevel receptive field module is designed to enlarge the receptive field and extract the multi-scale deep features. After that, a parallax attention fusion module is used to transform the parallax and merge the features. The proposed method is integrated into the platform of 3D-HEVC and the generated virtual reference frame is inserted into the reference picture list as an additional reference. Experimental results show that the proposed method achieves 5.31% average BD-rate reduction compared to the 3D-HEVC.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123971409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Prior Visual Relationship Reasoning For Visual Question Answering 视觉问答的先验视觉关系推理
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190771
Zhuoqian Yang, Zengchang Qin, Jing Yu, T. Wan
Visual Question Answering (VQA) is a representative task of cross-modal reasoning where an image and a free-form question in natural language are presented and the correct answer needs to be determined using both visual and textual information. One of the key issues of VQA is to reason with semantic clues in the visual content under the guidance of the question. In this paper, we propose Scene Graph Convolutional Network (SceneGCN) to jointly reason the object properties and their semantic relations for the correct answer. The visual relationship is projected into a deep learned semantic space constrained by visual context and language priors. Based on comprehensive experiments on two challenging datasets: GQA and VQA 2.0, we demonstrate the effectiveness and interpretability of the new model.
视觉问答(Visual Question answer, VQA)是一种跨模态推理的代表性任务,它以自然语言呈现图像和自由形式的问题,并且需要使用视觉和文本信息来确定正确的答案。VQA的关键问题之一是在问题的引导下,利用视觉内容中的语义线索进行推理。本文提出了场景图卷积网络(Scene Graph Convolutional Network, SceneGCN)来联合推理物体属性及其语义关系以获得正确答案。视觉关系被投射到一个受视觉语境和语言先验约束的深度学习语义空间中。在GQA和VQA 2.0两个具有挑战性的数据集上进行了综合实验,验证了新模型的有效性和可解释性。
{"title":"Prior Visual Relationship Reasoning For Visual Question Answering","authors":"Zhuoqian Yang, Zengchang Qin, Jing Yu, T. Wan","doi":"10.1109/ICIP40778.2020.9190771","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190771","url":null,"abstract":"Visual Question Answering (VQA) is a representative task of cross-modal reasoning where an image and a free-form question in natural language are presented and the correct answer needs to be determined using both visual and textual information. One of the key issues of VQA is to reason with semantic clues in the visual content under the guidance of the question. In this paper, we propose Scene Graph Convolutional Network (SceneGCN) to jointly reason the object properties and their semantic relations for the correct answer. The visual relationship is projected into a deep learned semantic space constrained by visual context and language priors. Based on comprehensive experiments on two challenging datasets: GQA and VQA 2.0, we demonstrate the effectiveness and interpretability of the new model.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124069852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Targeted Incorporating Spatial Information in Sparse Subspace Clustering of Hyperspectral Remote Sensing Images 高光谱遥感影像稀疏子空间聚类中空间信息的目标融合
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191336
Jiaqiyu Zhan, Yuesheng Zhu, Zhiqiang Bai
Methods based on sparse subspace clustering (SSC) have shown great potential for hyperspectral image (HSI) clustering. However their performance is limited due to the complex spatial-spectral structure in HSIs. In this paper, a spatial best-fit direction (SBFD) algorithm is proposed to update the coefficients obtained from sparse representation to more discriminant features by integrating the spatial-contextual information given by the best-fit pixel of each target pixel. Also, SBFD is more targeted by searching for the best-fit direction than directly using the local window to do max pooling. The proposed SBFD was tested on two widely used hyperspectral dataset, the experimental results indicate its improvement in the clustering accuracy and spatial homogeneity.
基于稀疏子空间聚类(SSC)的方法在高光谱图像聚类中显示出巨大的潜力。然而,由于hsi中复杂的空间光谱结构,限制了它们的性能。本文提出了一种空间最佳拟合方向(SBFD)算法,通过整合每个目标像素的最佳拟合像素所给出的空间上下文信息,将稀疏表示得到的系数更新为更具判别性的特征。与直接使用本地窗口进行最大池化相比,SBFD通过寻找最适合的方向更有针对性。在两个广泛使用的高光谱数据集上进行了测试,实验结果表明该方法在聚类精度和空间均匀性方面都有提高。
{"title":"Targeted Incorporating Spatial Information in Sparse Subspace Clustering of Hyperspectral Remote Sensing Images","authors":"Jiaqiyu Zhan, Yuesheng Zhu, Zhiqiang Bai","doi":"10.1109/ICIP40778.2020.9191336","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191336","url":null,"abstract":"Methods based on sparse subspace clustering (SSC) have shown great potential for hyperspectral image (HSI) clustering. However their performance is limited due to the complex spatial-spectral structure in HSIs. In this paper, a spatial best-fit direction (SBFD) algorithm is proposed to update the coefficients obtained from sparse representation to more discriminant features by integrating the spatial-contextual information given by the best-fit pixel of each target pixel. Also, SBFD is more targeted by searching for the best-fit direction than directly using the local window to do max pooling. The proposed SBFD was tested on two widely used hyperspectral dataset, the experimental results indicate its improvement in the clustering accuracy and spatial homogeneity.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124655422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asymptotic Closed-Loop Design Of Transform Modes For The Inter-Prediction Residual In Video Coding 视频编码中预测间残差变换模式的渐近闭环设计
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191323
B. Vishwanath, Shunyao Li, K. Rose
Transform coding is a key component of video coders, tasked with spatial decorrelation of the prediction residual. There is growing interest in adapting the transform to local statistics of the inter-prediction residual, going beyond a few standard trigonometric transforms. However, the joint design of multiple transform modes is highly challenging due to critical stability problems inherent to feedback through the codec’s prediction loop, wherein training updates inadvertently impact the signal statistics the transform ultimately operates on, and are often counter-productive (and sometimes catastrophic). It is the premise of this work that a truly effective switched transform design procedure must account for and circumvent this shortcoming. We introduce a data-driven approach to design optimal transform modes for adaptive switching by the encoder. Most importantly, to overcome the critical stability issues, the approach is derived within an asymptotic closed loop (ACL) design framework, wherein each iteration operates in an effective open loop, and is thus inherently stable, but with a subterfuge that ensures that, asymptotically, the design approaches closed loop operation, as required for the ultimate coder operation. Experimental results demonstrate the efficacy of the proposed optimization paradigm which yields significant performance gains over the state-of-the-art.
变换编码是视频编码器的关键组成部分,其任务是对预测残差进行空间去相关处理。除了一些标准的三角变换之外,人们对将变换适应于预测间残差的局部统计量越来越感兴趣。然而,多种变换模式的联合设计是非常具有挑战性的,因为通过编解码器的预测回路反馈固有的关键稳定性问题,其中训练更新无意中影响了变换最终操作的信号统计量,并且经常适得其反(有时是灾难性的)。这项工作的前提是,一个真正有效的开关转换设计程序必须考虑并绕过这个缺点。我们介绍了一种数据驱动的方法来设计编码器自适应切换的最佳转换模式。最重要的是,为了克服关键的稳定性问题,该方法是在渐进闭环(ACL)设计框架中推导出来的,其中每次迭代都在一个有效的开环中运行,因此具有固有的稳定性,但采用了一种掩饰手段,确保设计逐渐接近闭环操作,这是最终编码器操作所需要的。实验结果证明了所提出的优化范例的有效性,它比最先进的方法产生了显著的性能提升。
{"title":"Asymptotic Closed-Loop Design Of Transform Modes For The Inter-Prediction Residual In Video Coding","authors":"B. Vishwanath, Shunyao Li, K. Rose","doi":"10.1109/ICIP40778.2020.9191323","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191323","url":null,"abstract":"Transform coding is a key component of video coders, tasked with spatial decorrelation of the prediction residual. There is growing interest in adapting the transform to local statistics of the inter-prediction residual, going beyond a few standard trigonometric transforms. However, the joint design of multiple transform modes is highly challenging due to critical stability problems inherent to feedback through the codec’s prediction loop, wherein training updates inadvertently impact the signal statistics the transform ultimately operates on, and are often counter-productive (and sometimes catastrophic). It is the premise of this work that a truly effective switched transform design procedure must account for and circumvent this shortcoming. We introduce a data-driven approach to design optimal transform modes for adaptive switching by the encoder. Most importantly, to overcome the critical stability issues, the approach is derived within an asymptotic closed loop (ACL) design framework, wherein each iteration operates in an effective open loop, and is thus inherently stable, but with a subterfuge that ensures that, asymptotically, the design approaches closed loop operation, as required for the ultimate coder operation. Experimental results demonstrate the efficacy of the proposed optimization paradigm which yields significant performance gains over the state-of-the-art.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129505134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Dynamic Background Subtraction Using Least Square Adversarial Learning 使用最小二乘对抗学习的动态背景减法
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191235
M. Sultana, Arif Mahmood, T. Bouwmans, Soon Ki Jung
Dynamic Background Subtraction (BS) is a fundamental problem in many vision-based applications. BS in real complex environments has several challenging conditions like illumination variations, shadows, camera jitters, and bad weather. In this study, we aim to address the challenges of BS in complex scenes by exploiting conditional least squares adversarial networks. During training, a scene-specific conditional least squares adversarial network with two additional regularizations including L1-Loss and Perceptual-Loss is employed to learn the dynamic background variations. The given input to the model is video frames conditioned on corresponding ground truth to learn the dynamic changes in complex scenes. Afterwards, testing is performed on unseen test video frames so that the generator would conduct dynamic background subtraction. The proposed method consisting of three loss-terms including least squares adversarial loss, L1-Loss and Perceptual-Loss is evaluated on two benchmark datasets CDnet2014 and BMC. The results of our proposed method show improved performance on both datasets compared with 10 existing state-of-the-art methods.
动态背景减法(BS)是许多基于视觉的应用中的一个基本问题。BS在真实的复杂环境中有几个具有挑战性的条件,如照明变化,阴影,相机抖动和恶劣天气。在本研究中,我们的目标是通过利用条件最小二乘对抗网络来解决复杂场景中BS的挑战。在训练过程中,使用一个场景特定的条件最小二乘对抗网络,其中包含两个额外的正则化,包括L1-Loss和perception - loss,来学习动态背景变化。模型的给定输入是基于相应的ground truth条件的视频帧,以学习复杂场景中的动态变化。然后,对未见过的测试视频帧进行测试,以便生成器进行动态背景减法。该方法由最小二乘对抗损失、l1损失和感知损失三个损失项组成,并在CDnet2014和BMC两个基准数据集上进行了评估。与现有的10种最先进的方法相比,我们提出的方法在两个数据集上的性能都有所提高。
{"title":"Dynamic Background Subtraction Using Least Square Adversarial Learning","authors":"M. Sultana, Arif Mahmood, T. Bouwmans, Soon Ki Jung","doi":"10.1109/ICIP40778.2020.9191235","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191235","url":null,"abstract":"Dynamic Background Subtraction (BS) is a fundamental problem in many vision-based applications. BS in real complex environments has several challenging conditions like illumination variations, shadows, camera jitters, and bad weather. In this study, we aim to address the challenges of BS in complex scenes by exploiting conditional least squares adversarial networks. During training, a scene-specific conditional least squares adversarial network with two additional regularizations including L1-Loss and Perceptual-Loss is employed to learn the dynamic background variations. The given input to the model is video frames conditioned on corresponding ground truth to learn the dynamic changes in complex scenes. Afterwards, testing is performed on unseen test video frames so that the generator would conduct dynamic background subtraction. The proposed method consisting of three loss-terms including least squares adversarial loss, L1-Loss and Perceptual-Loss is evaluated on two benchmark datasets CDnet2014 and BMC. The results of our proposed method show improved performance on both datasets compared with 10 existing state-of-the-art methods.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129896440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Hybrid Learning-Based And Hevc-Based Coding Of Light Fields 基于混合学习和基于hevc的光场编码
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190971
Milan Stepanov, G. Valenzise, F. Dufaux
Light fields have additional storage requirements compared to conventional image and video signals, and demand therefore an efficient representation. In order to improve coding efficiency, in this work we propose a hybrid coding scheme which combines a learning-based compression approach with a traditional video coding scheme. Their integration offers great gains at low/mid bitrates thanks to the efficient representation of the learning-based approach and is competitive at high bitrates compared to standard tools thanks to the encoding of the residual signal. The proposed approach achieves on average 38% and 31% BD rate saving compared to HEVC and JPEG Pleno transform-based codec, respectively.
与传统的图像和视频信号相比,光场具有额外的存储要求,因此需要有效的表示。为了提高编码效率,本文提出了一种将基于学习的压缩方法与传统视频编码方法相结合的混合编码方案。由于基于学习的方法的有效表示,它们的集成在低/中比特率下提供了巨大的收益,并且由于剩余信号的编码,与标准工具相比,在高比特率下具有竞争力。与基于HEVC和JPEG Pleno变换的编解码器相比,该方法分别平均节省38%和31%的BD速率。
{"title":"Hybrid Learning-Based And Hevc-Based Coding Of Light Fields","authors":"Milan Stepanov, G. Valenzise, F. Dufaux","doi":"10.1109/ICIP40778.2020.9190971","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190971","url":null,"abstract":"Light fields have additional storage requirements compared to conventional image and video signals, and demand therefore an efficient representation. In order to improve coding efficiency, in this work we propose a hybrid coding scheme which combines a learning-based compression approach with a traditional video coding scheme. Their integration offers great gains at low/mid bitrates thanks to the efficient representation of the learning-based approach and is competitive at high bitrates compared to standard tools thanks to the encoding of the residual signal. The proposed approach achieves on average 38% and 31% BD rate saving compared to HEVC and JPEG Pleno transform-based codec, respectively.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128409172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Going Deeper With Neural Networks Without Skip Connections 深入研究没有跳过连接的神经网络
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191356
O. Oyedotun, Abd El Rahman Shabayek, Djamila Aouada, B. Ottersten
We propose the training of very deep neural networks (DNNs) without shortcut connections known as PlainNets. Training such networks is a notoriously hard problem due to: (1) the relatively popular challenge of vanishing and exploding activations, and (2) the less studied ‘near singularity’ problem. We argue that if the aforementioned problems are tackled together, the training of deeper PlainNets becomes easier. Subsequently, we propose the training of very deep PlainNets by leveraging Leaky Rectified Linear Units (LReLUs), parameter constraint and strategic parameter initialization. Our approach is simple and allows to successfully train very deep PlainNets having up to 100 layers without employing shortcut connections. To validate this approach, we validate on five challenging datasets; namely, MNIST, CIFAR-10, CIFAR100, SVHN and ImageNet datasets. We report the best results known on the ImageNet dataset using a PlainNet with top-1 and top-5 error rates of 24.1% and 7.3%, respectively.
我们提出了没有捷径连接的非常深度神经网络(dnn)的训练,称为PlainNets。训练这样的网络是一个众所周知的难题,因为:(1)相对流行的消失和爆炸激活的挑战,以及(2)研究较少的“近奇点”问题。我们认为,如果将上述问题一起解决,那么更深层次的PlainNets的训练就会变得更容易。随后,我们提出了利用Leaky Rectified Linear Units (LReLUs)、参数约束和策略参数初始化来训练非常深的PlainNets。我们的方法很简单,可以在不使用快捷连接的情况下成功训练具有多达100层的非常深的PlainNets。为了验证这种方法,我们在五个具有挑战性的数据集上进行了验证;即MNIST、CIFAR-10、CIFAR100、SVHN和ImageNet数据集。我们使用PlainNet在ImageNet数据集上报告了已知的最佳结果,前1和前5的错误率分别为24.1%和7.3%。
{"title":"Going Deeper With Neural Networks Without Skip Connections","authors":"O. Oyedotun, Abd El Rahman Shabayek, Djamila Aouada, B. Ottersten","doi":"10.1109/ICIP40778.2020.9191356","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191356","url":null,"abstract":"We propose the training of very deep neural networks (DNNs) without shortcut connections known as PlainNets. Training such networks is a notoriously hard problem due to: (1) the relatively popular challenge of vanishing and exploding activations, and (2) the less studied ‘near singularity’ problem. We argue that if the aforementioned problems are tackled together, the training of deeper PlainNets becomes easier. Subsequently, we propose the training of very deep PlainNets by leveraging Leaky Rectified Linear Units (LReLUs), parameter constraint and strategic parameter initialization. Our approach is simple and allows to successfully train very deep PlainNets having up to 100 layers without employing shortcut connections. To validate this approach, we validate on five challenging datasets; namely, MNIST, CIFAR-10, CIFAR100, SVHN and ImageNet datasets. We report the best results known on the ImageNet dataset using a PlainNet with top-1 and top-5 error rates of 24.1% and 7.3%, respectively.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128452217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2020 IEEE International Conference on Image Processing (ICIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1