首页 > 最新文献

IEEE Transactions on Image Processing最新文献

英文 中文
Towards weakly-supervised focus region detection via recurrent constraint network. 通过递归约束网络实现弱监督焦点区域检测
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2019-09-25 DOI: 10.1109/TIP.2019.2942505
Wenda Zhao, Xueqing Hou, Xiaobing Yu, You He, Huchuan Lu

Recent state-of-the-art methods on focus region detection (FRD) rely on deep convolutional networks trained with costly pixel-level annotations. In this study, we propose a FRD method that achieves competitive accuracies but only uses easily obtained bounding box annotations. Box-level tags provide important cues of focus regions but lose the boundary delineation of the transition area. A recurrent constraint network (RCN) is introduced for this challenge. In our static training, RCN is jointly trained with a fully convolutional network (FCN) through box-level supervision. The RCN can generate a detailed focus map to locate the boundary of the transition area effectively. In our dynamic training, we iterate between fine-tuning FCN and RCN with the generated pixel-level tags and generate finer new pixel-level tags. To boost the performance further, a guided conditional random field is developed to improve the quality of the generated pixel-level tags. To promote further study of the weakly supervised FRD methods, we construct a new dataset called FocusBox, which consists of 5000 challenging images with bounding box-level labels. Experimental results on existing datasets demonstrate that our method not only yields comparable results than fully supervised counterparts but also achieves a faster speed.

近期最先进的焦点区域检测(FRD)方法依赖于使用昂贵的像素级注释训练的深度卷积网络。在本研究中,我们提出了一种 FRD 方法,该方法可实现具有竞争力的精确度,但仅使用容易获得的边界框注释。方框级标记提供了焦点区域的重要线索,但却失去了过渡区域的边界划分。针对这一挑战,我们引入了递归约束网络(RCN)。在我们的静态训练中,RCN 通过盒级监督与全卷积网络(FCN)联合训练。RCN 可以生成详细的焦点图,从而有效定位过渡区域的边界。在动态训练中,我们利用生成的像素级标签在微调 FCN 和 RCN 之间进行迭代,生成更精细的新像素级标签。为了进一步提高性能,我们开发了一种引导条件随机场来提高生成的像素级标签的质量。为了促进对弱监督 FRD 方法的进一步研究,我们构建了一个名为 FocusBox 的新数据集,该数据集由 5000 张具有边界框级标签的挑战性图像组成。在现有数据集上的实验结果表明,我们的方法不仅能获得与全监督方法相当的结果,而且速度更快。
{"title":"Towards weakly-supervised focus region detection via recurrent constraint network.","authors":"Wenda Zhao, Xueqing Hou, Xiaobing Yu, You He, Huchuan Lu","doi":"10.1109/TIP.2019.2942505","DOIUrl":"10.1109/TIP.2019.2942505","url":null,"abstract":"<p><p>Recent state-of-the-art methods on focus region detection (FRD) rely on deep convolutional networks trained with costly pixel-level annotations. In this study, we propose a FRD method that achieves competitive accuracies but only uses easily obtained bounding box annotations. Box-level tags provide important cues of focus regions but lose the boundary delineation of the transition area. A recurrent constraint network (RCN) is introduced for this challenge. In our static training, RCN is jointly trained with a fully convolutional network (FCN) through box-level supervision. The RCN can generate a detailed focus map to locate the boundary of the transition area effectively. In our dynamic training, we iterate between fine-tuning FCN and RCN with the generated pixel-level tags and generate finer new pixel-level tags. To boost the performance further, a guided conditional random field is developed to improve the quality of the generated pixel-level tags. To promote further study of the weakly supervised FRD methods, we construct a new dataset called FocusBox, which consists of 5000 challenging images with bounding box-level labels. Experimental results on existing datasets demonstrate that our method not only yields comparable results than fully supervised counterparts but also achieves a faster speed.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62588586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Spatial and Temporal Network for Robust Visual Object Tracking. 用于稳健视觉对象跟踪的深度时空网络
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2019-09-25 DOI: 10.1109/TIP.2019.2942502
Zhu Teng, Junliang Xing, Qiang Wang, Baopeng Zhang, Jianping Fan

There are two key components that can be leveraged for visual tracking: (a) object appearances; and (b) object motions. Many existing techniques have recently employed deep learning to enhance visual tracking due to its superior representation power and strong learning ability, where most of them employed object appearances but few of them exploited object motions. In this work, a deep spatial and temporal network (DSTN) is developed for visual tracking by explicitly exploiting both the object representations from each frame and their dynamics along multiple frames in a video, such that it can seamlessly integrate the object appearances with their motions to produce compact object appearances and capture their temporal variations effectively. Our DSTN method, which is deployed into a tracking pipeline in a coarse-to-fine form, can perceive the subtle differences on spatial and temporal variations of the target (object being tracked), and thus it benefits from both off-line training and online fine-tuning. We have also conducted our experiments over four largest tracking benchmarks, including OTB-2013, OTB-2015, VOT2015, and VOT2017, and our experimental results have demonstrated that our DSTN method can achieve competitive performance as compared with the state-of-the-art techniques. The source code, trained models, and all the experimental results of this work will be made public available to facilitate further studies on this problem.

视觉跟踪可以利用两个关键要素:(a) 物体外观;(b) 物体运动。由于深度学习具有卓越的表示能力和强大的学习能力,许多现有技术最近都采用了深度学习来增强视觉跟踪能力,其中大多数技术都采用了物体外观,但很少有技术利用了物体运动。在这项工作中,我们开发了一种用于视觉跟踪的深度时空网络(DSTN),它明确利用了视频中每一帧的物体表征及其沿多帧的动态变化,从而可以将物体表征与物体运动进行无缝整合,生成紧凑的物体表征并有效捕捉其时间变化。我们的 DSTN 方法以从粗到细的形式部署到跟踪管道中,可以感知目标(被跟踪物体)在空间和时间变化上的细微差别,因此可以从离线训练和在线微调中获益。我们还在四个最大的跟踪基准(包括 OTB-2013、OTB-2015、VOT2015 和 VOT2017)上进行了实验,实验结果表明,与最先进的技术相比,我们的 DSTN 方法可以实现具有竞争力的性能。这项工作的源代码、训练模型和所有实验结果都将公开,以促进对这一问题的进一步研究。
{"title":"Deep Spatial and Temporal Network for Robust Visual Object Tracking.","authors":"Zhu Teng, Junliang Xing, Qiang Wang, Baopeng Zhang, Jianping Fan","doi":"10.1109/TIP.2019.2942502","DOIUrl":"10.1109/TIP.2019.2942502","url":null,"abstract":"<p><p>There are two key components that can be leveraged for visual tracking: (a) object appearances; and (b) object motions. Many existing techniques have recently employed deep learning to enhance visual tracking due to its superior representation power and strong learning ability, where most of them employed object appearances but few of them exploited object motions. In this work, a deep spatial and temporal network (DSTN) is developed for visual tracking by explicitly exploiting both the object representations from each frame and their dynamics along multiple frames in a video, such that it can seamlessly integrate the object appearances with their motions to produce compact object appearances and capture their temporal variations effectively. Our DSTN method, which is deployed into a tracking pipeline in a coarse-to-fine form, can perceive the subtle differences on spatial and temporal variations of the target (object being tracked), and thus it benefits from both off-line training and online fine-tuning. We have also conducted our experiments over four largest tracking benchmarks, including OTB-2013, OTB-2015, VOT2015, and VOT2017, and our experimental results have demonstrated that our DSTN method can achieve competitive performance as compared with the state-of-the-art techniques. The source code, trained models, and all the experimental results of this work will be made public available to facilitate further studies on this problem.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62588807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collective Affinity Learning for Partial Cross-Modal Hashing. 部分跨模态哈希的集体亲和学习。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2019-09-23 DOI: 10.1109/TIP.2019.2941858
Jun Guo, Wenwu Zhu

In the past decade, various unsupervised hashing methods have been developed for cross-modal retrieval. However, in real-world applications, it is often the incomplete case that every modality of data may suffer from some missing samples. Most existing works assume that every object appears in both modalities, hence they may not work well for partial multi-modal data. To address this problem, we propose a novel Collective Affinity Learning Method (CALM), which collectively and adaptively learns an anchor graph for generating binary codes on partial multi-modal data. In CALM, we first construct modality-specific bipartite graphs collectively, and derive a probabilistic model to figure out complete data-to-anchor affinities for each modality. Theoretical analysis reveals its ability to recover missing adjacency information. Moreover, a robust model is proposed to fuse these modality-specific affinities by adaptively learning a unified anchor graph. Then, the neighborhood information from the learned anchor graph acts as feedback, which guides the previous affinity reconstruction procedure. To solve the formulated optimization problem, we further develop an effective algorithm with linear time complexity and fast convergence. Last, Anchor Graph Hashing (AGH) is conducted on the fused affinities for cross-modal retrieval. Experimental results on benchmark datasets show that our proposed CALM consistently outperforms the existing methods.

在过去的十年中,人们开发了各种用于跨模态检索的无监督哈希方法。然而,在现实世界的应用中,每种模态的数据都可能存在样本缺失的不完整情况。现有的大多数方法都假定每个对象都同时出现在两种模态中,因此它们可能无法很好地处理部分多模态数据。为了解决这个问题,我们提出了一种新颖的集体亲和学习方法(CALM),它能集体自适应地学习锚图,以便在部分多模态数据上生成二进制代码。在 CALM 中,我们首先集体构建特定模态的双方形图,然后推导出一个概率模型,为每种模态找出完整的数据到锚点的亲和力。理论分析表明,该模型能够恢复缺失的邻接信息。此外,还提出了一种稳健模型,通过自适应学习统一的锚图来融合这些特定模态的亲和力。然后,从学习到的锚图中获得的邻接信息作为反馈,指导之前的亲和性重建过程。为了解决所提出的优化问题,我们进一步开发了一种具有线性时间复杂性和快速收敛性的有效算法。最后,我们对融合后的亲和力进行了锚图散列(AGH),以实现跨模态检索。在基准数据集上的实验结果表明,我们提出的 CALM 始终优于现有方法。
{"title":"Collective Affinity Learning for Partial Cross-Modal Hashing.","authors":"Jun Guo, Wenwu Zhu","doi":"10.1109/TIP.2019.2941858","DOIUrl":"10.1109/TIP.2019.2941858","url":null,"abstract":"<p><p>In the past decade, various unsupervised hashing methods have been developed for cross-modal retrieval. However, in real-world applications, it is often the incomplete case that every modality of data may suffer from some missing samples. Most existing works assume that every object appears in both modalities, hence they may not work well for partial multi-modal data. To address this problem, we propose a novel Collective Affinity Learning Method (CALM), which collectively and adaptively learns an anchor graph for generating binary codes on partial multi-modal data. In CALM, we first construct modality-specific bipartite graphs collectively, and derive a probabilistic model to figure out complete data-to-anchor affinities for each modality. Theoretical analysis reveals its ability to recover missing adjacency information. Moreover, a robust model is proposed to fuse these modality-specific affinities by adaptively learning a unified anchor graph. Then, the neighborhood information from the learned anchor graph acts as feedback, which guides the previous affinity reconstruction procedure. To solve the formulated optimization problem, we further develop an effective algorithm with linear time complexity and fast convergence. Last, Anchor Graph Hashing (AGH) is conducted on the fused affinities for cross-modal retrieval. Experimental results on benchmark datasets show that our proposed CALM consistently outperforms the existing methods.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62588314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-rank quaternion approximation for color image processing. 用于彩色图像处理的低秩四元近似法
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2019-09-19 DOI: 10.1109/TIP.2019.2941319
Yongyong Chen, Xiaolin Xiao, Yicong Zhou

Low-rank matrix approximation (LRMA)-based methods have made a great success for grayscale image processing. When handling color images, LRMA either restores each color channel independently using the monochromatic model or processes the concatenation of three color channels using the concatenation model. However, these two schemes may not make full use of the high correlation among RGB channels. To address this issue, we propose a novel low-rank quaternion approximation (LRQA) model. It contains two major components: first, instead of modeling a color image pixel as a scalar in conventional sparse representation and LRMA-based methods, the color image is encoded as a pure quaternion matrix, such that the cross-channel correlation of color channels can be well exploited; second, LRQA imposes the low-rank constraint on the constructed quaternion matrix. To better estimate the singular values of the underlying low-rank quaternion matrix from its noisy observation, a general model for LRQA is proposed based on several nonconvex functions. Extensive evaluations for color image denoising and inpainting tasks verify that LRQA achieves better performance over several state-of-the-art sparse representation and LRMA-based methods in terms of both quantitative metrics and visual quality.

基于低秩矩阵近似(LRMA)的方法在灰度图像处理方面取得了巨大成功。在处理彩色图像时,LRMA 要么使用单色模型独立恢复每个彩色通道,要么使用串联模型处理三个彩色通道的串联。然而,这两种方案可能无法充分利用 RGB 通道之间的高度相关性。为解决这一问题,我们提出了一种新颖的低阶四元近似(LRQA)模型。它包含两个主要部分:首先,在传统的稀疏表示和基于 LRMA 的方法中,彩色图像像素不是作为标量建模,而是作为纯四元数矩阵编码,这样就能很好地利用彩色通道的跨通道相关性;其次,LRQA 对构建的四元数矩阵施加了低秩约束。为了从噪声观测中更好地估计底层低阶四元数矩阵的奇异值,我们提出了一种基于多个非凸函数的 LRQA 通用模型。通过对彩色图像去噪和内绘任务的广泛评估,验证了 LRQA 在定量指标和视觉质量方面都优于几种最先进的稀疏表示和基于 LRMA 的方法。
{"title":"Low-rank quaternion approximation for color image processing.","authors":"Yongyong Chen, Xiaolin Xiao, Yicong Zhou","doi":"10.1109/TIP.2019.2941319","DOIUrl":"10.1109/TIP.2019.2941319","url":null,"abstract":"<p><p>Low-rank matrix approximation (LRMA)-based methods have made a great success for grayscale image processing. When handling color images, LRMA either restores each color channel independently using the monochromatic model or processes the concatenation of three color channels using the concatenation model. However, these two schemes may not make full use of the high correlation among RGB channels. To address this issue, we propose a novel low-rank quaternion approximation (LRQA) model. It contains two major components: first, instead of modeling a color image pixel as a scalar in conventional sparse representation and LRMA-based methods, the color image is encoded as a pure quaternion matrix, such that the cross-channel correlation of color channels can be well exploited; second, LRQA imposes the low-rank constraint on the constructed quaternion matrix. To better estimate the singular values of the underlying low-rank quaternion matrix from its noisy observation, a general model for LRQA is proposed based on several nonconvex functions. Extensive evaluations for color image denoising and inpainting tasks verify that LRQA achieves better performance over several state-of-the-art sparse representation and LRMA-based methods in terms of both quantitative metrics and visual quality.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62587764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Noise-Robust Iterative Back-Projection. 噪声抑制迭代反投影
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2019-09-16 DOI: 10.1109/TIP.2019.2940414
Jun-Sang Yoo, Jong-Ok Kim

Noisy image super-resolution (SR) is a significant challenging process due to the smoothness caused by denoising. Iterative back-projection (IBP) can be helpful in further enhancing the reconstructed SR image, but there is no clean reference image available. This paper proposes a novel back-projection algorithm for noisy image SR. Its main goal is to pursuit the consistency between LR and SR images. We aim to estimate the clean reconstruction error to be back-projected, using the noisy and denoised reconstruction errors. We formulate a new cost function on the principal component analysis (PCA) transform domain to estimate the clean reconstruction error. In the data term of the cost function, noisy and denoised reconstruction errors are combined in a region-adaptive manner using texture probability. In addition, the sparsity constraint is incorporated into the regularization term, based on the Laplacian characteristics of the reconstruction error. Finally, we propose an eigenvector estimation method to minimize the effect of noise. The experimental results demonstrate that the proposed method can perform back-projection in a more noise-robust manner than the conventional IBP, and harmoniously work with any other SR methods as a post-processing.

噪点图像超分辨率(SR)是一个极具挑战性的过程,因为去噪会导致图像不平滑。迭代反投影(IBP)有助于进一步增强重建的 SR 图像,但没有干净的参考图像可用。本文提出了一种用于噪声图像 SR 的新型反投影算法。其主要目标是追求 LR 和 SR 图像之间的一致性。我们的目标是利用噪声和去噪重建误差来估计待反投影的干净重建误差。我们在主成分分析(PCA)变换域上制定了一个新的代价函数来估计干净的重建误差。在成本函数的数据项中,利用纹理概率以区域自适应的方式将噪声和去噪重建误差结合起来。此外,根据重建误差的拉普拉斯特性,将稀疏性约束纳入正则化项。最后,我们提出了一种特征向量估计方法,以最大限度地减少噪声的影响。实验结果表明,与传统的 IBP 方法相比,所提出的方法能以更低噪声的方式进行反向投影,并能作为后处理与其他任何 SR 方法协调工作。
{"title":"Noise-Robust Iterative Back-Projection.","authors":"Jun-Sang Yoo, Jong-Ok Kim","doi":"10.1109/TIP.2019.2940414","DOIUrl":"10.1109/TIP.2019.2940414","url":null,"abstract":"<p><p>Noisy image super-resolution (SR) is a significant challenging process due to the smoothness caused by denoising. Iterative back-projection (IBP) can be helpful in further enhancing the reconstructed SR image, but there is no clean reference image available. This paper proposes a novel back-projection algorithm for noisy image SR. Its main goal is to pursuit the consistency between LR and SR images. We aim to estimate the clean reconstruction error to be back-projected, using the noisy and denoised reconstruction errors. We formulate a new cost function on the principal component analysis (PCA) transform domain to estimate the clean reconstruction error. In the data term of the cost function, noisy and denoised reconstruction errors are combined in a region-adaptive manner using texture probability. In addition, the sparsity constraint is incorporated into the regularization term, based on the Laplacian characteristics of the reconstruction error. Finally, we propose an eigenvector estimation method to minimize the effect of noise. The experimental results demonstrate that the proposed method can perform back-projection in a more noise-robust manner than the conventional IBP, and harmoniously work with any other SR methods as a post-processing.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62587355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Nonclassical Receptive Field Modulation for Contour Detection. 学习用于轮廓检测的非经典感知场调制
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2019-09-16 DOI: 10.1109/TIP.2019.2940690
Qiling Tang, Nong Sang, Haihua Liu

This work develops a biologically inspired neural network for contour detection in natural images by combining the nonclassical receptive field modulation mechanism with a deep learning framework. The input image is first convolved with the local feature detectors to produce the classical receptive field responses, and then a corresponding modulatory kernel is constructed for each feature map to model the nonclassical receptive field modulation behaviors. The modulatory effects can activate a larger cortical area and thus allow cortical neurons to integrate a broader range of visual information to recognize complex cases. Additionally, to characterize spatial structures at various scales, a multiresolution technique is used to represent visual field information from fine to coarse. Different scale responses are combined to estimate the contour probability. Our method achieves state-of-the-art results among all biologically inspired contour detection models. This study provides a method for improving visual modeling of contour detection and inspires new ideas for integrating more brain cognitive mechanisms into deep neural networks.

这项研究通过将非经典感受野调制机制与深度学习框架相结合,开发了一种受生物启发的神经网络,用于自然图像中的轮廓检测。首先将输入图像与局部特征检测器进行卷积,以产生经典感受野响应,然后为每个特征图构建相应的调制核,以模拟非经典感受野调制行为。调制效应可以激活更大的皮层区域,从而使皮层神经元能够整合更广泛的视觉信息来识别复杂的情况。此外,为了表征不同尺度的空间结构,还使用了多分辨率技术来表示从精细到粗糙的视场信息。不同尺度的反应被结合起来以估计轮廓概率。在所有受生物启发的轮廓检测模型中,我们的方法取得了最先进的结果。这项研究提供了一种改进轮廓检测视觉建模的方法,并启发了将更多大脑认知机制整合到深度神经网络中的新思路。
{"title":"Learning Nonclassical Receptive Field Modulation for Contour Detection.","authors":"Qiling Tang, Nong Sang, Haihua Liu","doi":"10.1109/TIP.2019.2940690","DOIUrl":"10.1109/TIP.2019.2940690","url":null,"abstract":"<p><p>This work develops a biologically inspired neural network for contour detection in natural images by combining the nonclassical receptive field modulation mechanism with a deep learning framework. The input image is first convolved with the local feature detectors to produce the classical receptive field responses, and then a corresponding modulatory kernel is constructed for each feature map to model the nonclassical receptive field modulation behaviors. The modulatory effects can activate a larger cortical area and thus allow cortical neurons to integrate a broader range of visual information to recognize complex cases. Additionally, to characterize spatial structures at various scales, a multiresolution technique is used to represent visual field information from fine to coarse. Different scale responses are combined to estimate the contour probability. Our method achieves state-of-the-art results among all biologically inspired contour detection models. This study provides a method for improving visual modeling of contour detection and inspires new ideas for integrating more brain cognitive mechanisms into deep neural networks.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62587689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Single-Stage Pedestrian Detector by Asymptotic Localization Fitting and Multi-Scale Context Encoding. 通过渐近定位拟合和多尺度上下文编码实现高效的单级行人检测器
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2019-09-16 DOI: 10.1109/TIP.2019.2938877
Wei Liu, Shengcai Liao, Weidong Hu

Though Faster R-CNN based two-stage detectors have witnessed significant boost in pedestrian detection accuracy, they are still slow for practical applications. One solution is to simplify this working flow as a single-stage detector. However, current single-stage detectors (e.g. SSD) have not presented competitive accuracy on common pedestrian detection benchmarks. Accordingly, a structurally simple but effective module called Asymptotic Localization Fitting (ALF) is proposed, which stacks a series of predictors to directly evolve the default anchor boxes of SSD step by step to improve detection results. Additionally, combining the advantages from residual learning and multi-scale context encoding, a bottleneck block is proposed to enhance the predictors' discriminative power. On top of the above designs, an efficient single-stage detection architecture is designed, resulting in an attractive pedestrian detector in both accuracy and speed. A comprehensive set of experiments on two of the largest pedestrian detection datasets (i.e. CityPersons and Caltech) demonstrate the superiority of the proposed method, comparing to the state of the arts on both the benchmarks.

虽然基于更快 R-CNN 的两级检测器显著提高了行人检测的准确性,但在实际应用中仍然很慢。一种解决方案是将这一工作流程简化为单级检测器。然而,目前的单级检测器(如 SSD)在常见的行人检测基准上并没有显示出具有竞争力的精度。因此,我们提出了一个结构简单但有效的模块,称为渐进定位拟合(ALF),它堆叠了一系列预测器,可直接逐步演化 SSD 的默认锚点框,从而提高检测结果。此外,结合残差学习和多尺度上下文编码的优势,还提出了一个瓶颈区块,以增强预测器的判别能力。在上述设计的基础上,还设计了一种高效的单级检测架构,从而使行人检测器在准确性和速度方面都具有吸引力。在两个最大的行人检测数据集(即 CityPersons 和 Caltech)上进行的一系列综合实验证明,与这两个基准上的技术水平相比,所提出的方法更胜一筹。
{"title":"Efficient Single-Stage Pedestrian Detector by Asymptotic Localization Fitting and Multi-Scale Context Encoding.","authors":"Wei Liu, Shengcai Liao, Weidong Hu","doi":"10.1109/TIP.2019.2938877","DOIUrl":"10.1109/TIP.2019.2938877","url":null,"abstract":"<p><p>Though Faster R-CNN based two-stage detectors have witnessed significant boost in pedestrian detection accuracy, they are still slow for practical applications. One solution is to simplify this working flow as a single-stage detector. However, current single-stage detectors (e.g. SSD) have not presented competitive accuracy on common pedestrian detection benchmarks. Accordingly, a structurally simple but effective module called Asymptotic Localization Fitting (ALF) is proposed, which stacks a series of predictors to directly evolve the default anchor boxes of SSD step by step to improve detection results. Additionally, combining the advantages from residual learning and multi-scale context encoding, a bottleneck block is proposed to enhance the predictors' discriminative power. On top of the above designs, an efficient single-stage detection architecture is designed, resulting in an attractive pedestrian detector in both accuracy and speed. A comprehensive set of experiments on two of the largest pedestrian detection datasets (i.e. CityPersons and Caltech) demonstrate the superiority of the proposed method, comparing to the state of the arts on both the benchmarks.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable Deep Hashing for Large-scale Social Image Retrieval. 用于大规模社交图像检索的可扩展深度哈希算法
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2019-09-16 DOI: 10.1109/TIP.2019.2940693
Hui Cui, Lei Zhu, Jingjing Li, Yang Yang, Liqiang Nie

Recent years have witnessed the wide application of hashing for large-scale image retrieval, because of its high computation efficiency and low storage cost. Particularly, benefiting from current advances in deep learning, supervised deep hashing methods have greatly boosted the retrieval performance, under the strong supervision of large amounts of manually annotated semantic labels. However, their performance is highly dependent upon the supervised labels, which significantly limits the scalability. In contrast, unsupervised deep hashing without label dependence enjoys the advantages of well scalability. Nevertheless, due to the relaxed hash optimization, and more importantly, the lack of semantic guidance, existing methods suffer from limited retrieval performance. In this paper, we propose a SCAlable Deep Hashing (SCADH) to learn enhanced hash codes for social image retrieval. We formulate a unified scalable deep hash learning framework which explores the weak but free supervision of discriminative user tags that are commonly accompanied with social images. It jointly learns image representations and hash functions with deep neural networks, and simultaneously enhances the discriminative capability of image hash codes with the refined semantics from the accompanied social tags. Further, instead of simple relaxed hash optimization, we propose a discrete hash optimization method based on Augmented Lagrangian Multiplier to directly solve the hash codes and avoid the binary quantization information loss. Experiments on two standard social image datasets demonstrate the superiority of the proposed approach compared with stateof- the-art shallow and deep hashing techniques.

近年来,由于哈希算法具有计算效率高、存储成本低的特点,被广泛应用于大规模图像检索。特别是受益于当前深度学习的进步,在大量人工标注语义标签的有力监督下,有监督的深度散列方法大大提高了检索性能。然而,这些方法的性能高度依赖于监督标签,这大大限制了其可扩展性。相比之下,不依赖标签的无监督深度散列具有良好的可扩展性优势。然而,由于放松了哈希优化,更重要的是缺乏语义指导,现有方法的检索性能有限。在本文中,我们提出了一种 SCAlable Deep Hashing(SCADH)来学习增强的哈希码,用于社交图像检索。我们制定了一个统一的可扩展深度哈希学习框架,该框架探索了通常与社交图像一起出现的用户标签的微弱但自由的鉴别监督。它利用深度神经网络联合学习图像表征和哈希函数,同时利用所附社交标签的精炼语义增强图像哈希代码的判别能力。此外,我们还提出了一种基于增强拉格朗日乘法器的离散哈希优化方法,取代简单的松弛哈希优化,直接求解哈希码,避免了二进制量化信息损失。在两个标准社交图像数据集上的实验证明,与最先进的浅散列和深散列技术相比,所提出的方法更具优势。
{"title":"Scalable Deep Hashing for Large-scale Social Image Retrieval.","authors":"Hui Cui, Lei Zhu, Jingjing Li, Yang Yang, Liqiang Nie","doi":"10.1109/TIP.2019.2940693","DOIUrl":"10.1109/TIP.2019.2940693","url":null,"abstract":"<p><p>Recent years have witnessed the wide application of hashing for large-scale image retrieval, because of its high computation efficiency and low storage cost. Particularly, benefiting from current advances in deep learning, supervised deep hashing methods have greatly boosted the retrieval performance, under the strong supervision of large amounts of manually annotated semantic labels. However, their performance is highly dependent upon the supervised labels, which significantly limits the scalability. In contrast, unsupervised deep hashing without label dependence enjoys the advantages of well scalability. Nevertheless, due to the relaxed hash optimization, and more importantly, the lack of semantic guidance, existing methods suffer from limited retrieval performance. In this paper, we propose a SCAlable Deep Hashing (SCADH) to learn enhanced hash codes for social image retrieval. We formulate a unified scalable deep hash learning framework which explores the weak but free supervision of discriminative user tags that are commonly accompanied with social images. It jointly learns image representations and hash functions with deep neural networks, and simultaneously enhances the discriminative capability of image hash codes with the refined semantics from the accompanied social tags. Further, instead of simple relaxed hash optimization, we propose a discrete hash optimization method based on Augmented Lagrangian Multiplier to directly solve the hash codes and avoid the binary quantization information loss. Experiments on two standard social image datasets demonstrate the superiority of the proposed approach compared with stateof- the-art shallow and deep hashing techniques.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62587873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain-Transformable Sparse Representation for Anomaly Detection in Moving-Camera Videos. 用于移动摄像机视频异常检测的可域变换稀疏表示法
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2019-09-16 DOI: 10.1109/TIP.2019.2940686
Eric Jardim, Lucas A Thomaz, Eduardo A B da Silva, Sergio L Netto

This paper presents a special matrix factorization based on sparse representation that detects anomalies in video sequences generated with moving cameras. Such representation is made by associating the frames of the target video, that is a sequence to be tested for the presence of anomalies, with the frames of an anomaly-free reference video, which is a previously validated sequence. This factorization is done by a sparse coefficient matrix, and any target-video anomaly is encapsulated into a residue term. In order to cope with camera trepidations, domaintransformations are incorporated into the sparse representation process. Approximations of the transformed-domain optimization problem are introduced to turn it into a feasible iterative process. Results obtained from a comprehensive video database acquired with moving cameras on a visually cluttered environment indicate that the proposed algorithm provides a better geometric registration between reference and target videos, greatly improving the overall performance of the anomaly-detection system.

本文提出了一种基于稀疏表示的特殊矩阵因式分解方法,可检测移动摄像机生成的视频序列中的异常情况。这种表示方法是将目标视频(即要检测是否存在异常的序列)的帧与无异常参考视频(即先前验证过的序列)的帧关联起来。这种因式分解是通过稀疏系数矩阵完成的,任何目标视频异常都会被封装成一个残差项。为了应对摄像机的抖动,稀疏表示过程中加入了域变换。引入了变换域优化问题的近似值,将其转化为可行的迭代过程。在视觉混乱的环境中使用移动摄像机获取的综合视频数据库的结果表明,所提出的算法能在参考视频和目标视频之间提供更好的几何注册,从而大大提高异常检测系统的整体性能。
{"title":"Domain-Transformable Sparse Representation for Anomaly Detection in Moving-Camera Videos.","authors":"Eric Jardim, Lucas A Thomaz, Eduardo A B da Silva, Sergio L Netto","doi":"10.1109/TIP.2019.2940686","DOIUrl":"10.1109/TIP.2019.2940686","url":null,"abstract":"<p><p>This paper presents a special matrix factorization based on sparse representation that detects anomalies in video sequences generated with moving cameras. Such representation is made by associating the frames of the target video, that is a sequence to be tested for the presence of anomalies, with the frames of an anomaly-free reference video, which is a previously validated sequence. This factorization is done by a sparse coefficient matrix, and any target-video anomaly is encapsulated into a residue term. In order to cope with camera trepidations, domaintransformations are incorporated into the sparse representation process. Approximations of the transformed-domain optimization problem are introduced to turn it into a feasible iterative process. Results obtained from a comprehensive video database acquired with moving cameras on a visually cluttered environment indicate that the proposed algorithm provides a better geometric registration between reference and target videos, greatly improving the overall performance of the anomaly-detection system.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62587136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hazy Image Decolorization with Color Contrast Restoration. 利用色彩对比度修复技术为模糊图像脱色
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2019-09-12 DOI: 10.1109/TIP.2019.2939946
Wei Wang, Zhengguo Li, Shiqian Wu, Liangcai Zeng

It is challenging to convert a hazy color image into a gray-scale image because the color contrast field of a hazy image is distorted. In this paper, a novel decolorization algorithm is proposed to transfer a hazy image into a distortionrecovered gray-scale image. To recover the color contrast field, the relationship between the restored color contrast and its distorted input is presented in CIELab color space. Based on this restoration, a nonlinear optimization problem is formulated to construct the resultant gray-scale image. A new differentiable approximation solution is introduced to solve this problem with an extension of the Huber loss function. Experimental results show that the proposed algorithm effectively preserves the global luminance consistency while represents the original color contrast in gray-scales, which is very close to the corresponding ground truth gray-scale one.

将模糊彩色图像转换成灰度图像是一项挑战,因为模糊图像的色彩对比场会失真。本文提出了一种新颖的脱色算法,可将模糊图像转换为失真恢复后的灰度图像。为了恢复色彩对比度场,在 CIELab 色彩空间中提出了恢复的色彩对比度与其失真输入之间的关系。在此基础上,提出了一个非线性优化问题,以构建灰度图像的结果。为了解决这个问题,引入了一个新的可微分近似解决方案,并对 Huber 损失函数进行了扩展。实验结果表明,所提出的算法有效地保持了全局亮度的一致性,同时在灰度上体现了原始的色彩对比度,非常接近相应的地面真实灰度图像。
{"title":"Hazy Image Decolorization with Color Contrast Restoration.","authors":"Wei Wang, Zhengguo Li, Shiqian Wu, Liangcai Zeng","doi":"10.1109/TIP.2019.2939946","DOIUrl":"10.1109/TIP.2019.2939946","url":null,"abstract":"<p><p>It is challenging to convert a hazy color image into a gray-scale image because the color contrast field of a hazy image is distorted. In this paper, a novel decolorization algorithm is proposed to transfer a hazy image into a distortionrecovered gray-scale image. To recover the color contrast field, the relationship between the restored color contrast and its distorted input is presented in CIELab color space. Based on this restoration, a nonlinear optimization problem is formulated to construct the resultant gray-scale image. A new differentiable approximation solution is introduced to solve this problem with an extension of the Huber loss function. Experimental results show that the proposed algorithm effectively preserves the global luminance consistency while represents the original color contrast in gray-scales, which is very close to the corresponding ground truth gray-scale one.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Image Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1