Pub Date : 2019-09-25DOI: 10.1109/TIP.2019.2942505
Wenda Zhao, Xueqing Hou, Xiaobing Yu, You He, Huchuan Lu
Recent state-of-the-art methods on focus region detection (FRD) rely on deep convolutional networks trained with costly pixel-level annotations. In this study, we propose a FRD method that achieves competitive accuracies but only uses easily obtained bounding box annotations. Box-level tags provide important cues of focus regions but lose the boundary delineation of the transition area. A recurrent constraint network (RCN) is introduced for this challenge. In our static training, RCN is jointly trained with a fully convolutional network (FCN) through box-level supervision. The RCN can generate a detailed focus map to locate the boundary of the transition area effectively. In our dynamic training, we iterate between fine-tuning FCN and RCN with the generated pixel-level tags and generate finer new pixel-level tags. To boost the performance further, a guided conditional random field is developed to improve the quality of the generated pixel-level tags. To promote further study of the weakly supervised FRD methods, we construct a new dataset called FocusBox, which consists of 5000 challenging images with bounding box-level labels. Experimental results on existing datasets demonstrate that our method not only yields comparable results than fully supervised counterparts but also achieves a faster speed.
{"title":"Towards weakly-supervised focus region detection via recurrent constraint network.","authors":"Wenda Zhao, Xueqing Hou, Xiaobing Yu, You He, Huchuan Lu","doi":"10.1109/TIP.2019.2942505","DOIUrl":"10.1109/TIP.2019.2942505","url":null,"abstract":"<p><p>Recent state-of-the-art methods on focus region detection (FRD) rely on deep convolutional networks trained with costly pixel-level annotations. In this study, we propose a FRD method that achieves competitive accuracies but only uses easily obtained bounding box annotations. Box-level tags provide important cues of focus regions but lose the boundary delineation of the transition area. A recurrent constraint network (RCN) is introduced for this challenge. In our static training, RCN is jointly trained with a fully convolutional network (FCN) through box-level supervision. The RCN can generate a detailed focus map to locate the boundary of the transition area effectively. In our dynamic training, we iterate between fine-tuning FCN and RCN with the generated pixel-level tags and generate finer new pixel-level tags. To boost the performance further, a guided conditional random field is developed to improve the quality of the generated pixel-level tags. To promote further study of the weakly supervised FRD methods, we construct a new dataset called FocusBox, which consists of 5000 challenging images with bounding box-level labels. Experimental results on existing datasets demonstrate that our method not only yields comparable results than fully supervised counterparts but also achieves a faster speed.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62588586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-25DOI: 10.1109/TIP.2019.2942502
Zhu Teng, Junliang Xing, Qiang Wang, Baopeng Zhang, Jianping Fan
There are two key components that can be leveraged for visual tracking: (a) object appearances; and (b) object motions. Many existing techniques have recently employed deep learning to enhance visual tracking due to its superior representation power and strong learning ability, where most of them employed object appearances but few of them exploited object motions. In this work, a deep spatial and temporal network (DSTN) is developed for visual tracking by explicitly exploiting both the object representations from each frame and their dynamics along multiple frames in a video, such that it can seamlessly integrate the object appearances with their motions to produce compact object appearances and capture their temporal variations effectively. Our DSTN method, which is deployed into a tracking pipeline in a coarse-to-fine form, can perceive the subtle differences on spatial and temporal variations of the target (object being tracked), and thus it benefits from both off-line training and online fine-tuning. We have also conducted our experiments over four largest tracking benchmarks, including OTB-2013, OTB-2015, VOT2015, and VOT2017, and our experimental results have demonstrated that our DSTN method can achieve competitive performance as compared with the state-of-the-art techniques. The source code, trained models, and all the experimental results of this work will be made public available to facilitate further studies on this problem.
{"title":"Deep Spatial and Temporal Network for Robust Visual Object Tracking.","authors":"Zhu Teng, Junliang Xing, Qiang Wang, Baopeng Zhang, Jianping Fan","doi":"10.1109/TIP.2019.2942502","DOIUrl":"10.1109/TIP.2019.2942502","url":null,"abstract":"<p><p>There are two key components that can be leveraged for visual tracking: (a) object appearances; and (b) object motions. Many existing techniques have recently employed deep learning to enhance visual tracking due to its superior representation power and strong learning ability, where most of them employed object appearances but few of them exploited object motions. In this work, a deep spatial and temporal network (DSTN) is developed for visual tracking by explicitly exploiting both the object representations from each frame and their dynamics along multiple frames in a video, such that it can seamlessly integrate the object appearances with their motions to produce compact object appearances and capture their temporal variations effectively. Our DSTN method, which is deployed into a tracking pipeline in a coarse-to-fine form, can perceive the subtle differences on spatial and temporal variations of the target (object being tracked), and thus it benefits from both off-line training and online fine-tuning. We have also conducted our experiments over four largest tracking benchmarks, including OTB-2013, OTB-2015, VOT2015, and VOT2017, and our experimental results have demonstrated that our DSTN method can achieve competitive performance as compared with the state-of-the-art techniques. The source code, trained models, and all the experimental results of this work will be made public available to facilitate further studies on this problem.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62588807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-23DOI: 10.1109/TIP.2019.2941858
Jun Guo, Wenwu Zhu
In the past decade, various unsupervised hashing methods have been developed for cross-modal retrieval. However, in real-world applications, it is often the incomplete case that every modality of data may suffer from some missing samples. Most existing works assume that every object appears in both modalities, hence they may not work well for partial multi-modal data. To address this problem, we propose a novel Collective Affinity Learning Method (CALM), which collectively and adaptively learns an anchor graph for generating binary codes on partial multi-modal data. In CALM, we first construct modality-specific bipartite graphs collectively, and derive a probabilistic model to figure out complete data-to-anchor affinities for each modality. Theoretical analysis reveals its ability to recover missing adjacency information. Moreover, a robust model is proposed to fuse these modality-specific affinities by adaptively learning a unified anchor graph. Then, the neighborhood information from the learned anchor graph acts as feedback, which guides the previous affinity reconstruction procedure. To solve the formulated optimization problem, we further develop an effective algorithm with linear time complexity and fast convergence. Last, Anchor Graph Hashing (AGH) is conducted on the fused affinities for cross-modal retrieval. Experimental results on benchmark datasets show that our proposed CALM consistently outperforms the existing methods.
{"title":"Collective Affinity Learning for Partial Cross-Modal Hashing.","authors":"Jun Guo, Wenwu Zhu","doi":"10.1109/TIP.2019.2941858","DOIUrl":"10.1109/TIP.2019.2941858","url":null,"abstract":"<p><p>In the past decade, various unsupervised hashing methods have been developed for cross-modal retrieval. However, in real-world applications, it is often the incomplete case that every modality of data may suffer from some missing samples. Most existing works assume that every object appears in both modalities, hence they may not work well for partial multi-modal data. To address this problem, we propose a novel Collective Affinity Learning Method (CALM), which collectively and adaptively learns an anchor graph for generating binary codes on partial multi-modal data. In CALM, we first construct modality-specific bipartite graphs collectively, and derive a probabilistic model to figure out complete data-to-anchor affinities for each modality. Theoretical analysis reveals its ability to recover missing adjacency information. Moreover, a robust model is proposed to fuse these modality-specific affinities by adaptively learning a unified anchor graph. Then, the neighborhood information from the learned anchor graph acts as feedback, which guides the previous affinity reconstruction procedure. To solve the formulated optimization problem, we further develop an effective algorithm with linear time complexity and fast convergence. Last, Anchor Graph Hashing (AGH) is conducted on the fused affinities for cross-modal retrieval. Experimental results on benchmark datasets show that our proposed CALM consistently outperforms the existing methods.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62588314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-19DOI: 10.1109/TIP.2019.2941319
Yongyong Chen, Xiaolin Xiao, Yicong Zhou
Low-rank matrix approximation (LRMA)-based methods have made a great success for grayscale image processing. When handling color images, LRMA either restores each color channel independently using the monochromatic model or processes the concatenation of three color channels using the concatenation model. However, these two schemes may not make full use of the high correlation among RGB channels. To address this issue, we propose a novel low-rank quaternion approximation (LRQA) model. It contains two major components: first, instead of modeling a color image pixel as a scalar in conventional sparse representation and LRMA-based methods, the color image is encoded as a pure quaternion matrix, such that the cross-channel correlation of color channels can be well exploited; second, LRQA imposes the low-rank constraint on the constructed quaternion matrix. To better estimate the singular values of the underlying low-rank quaternion matrix from its noisy observation, a general model for LRQA is proposed based on several nonconvex functions. Extensive evaluations for color image denoising and inpainting tasks verify that LRQA achieves better performance over several state-of-the-art sparse representation and LRMA-based methods in terms of both quantitative metrics and visual quality.
{"title":"Low-rank quaternion approximation for color image processing.","authors":"Yongyong Chen, Xiaolin Xiao, Yicong Zhou","doi":"10.1109/TIP.2019.2941319","DOIUrl":"10.1109/TIP.2019.2941319","url":null,"abstract":"<p><p>Low-rank matrix approximation (LRMA)-based methods have made a great success for grayscale image processing. When handling color images, LRMA either restores each color channel independently using the monochromatic model or processes the concatenation of three color channels using the concatenation model. However, these two schemes may not make full use of the high correlation among RGB channels. To address this issue, we propose a novel low-rank quaternion approximation (LRQA) model. It contains two major components: first, instead of modeling a color image pixel as a scalar in conventional sparse representation and LRMA-based methods, the color image is encoded as a pure quaternion matrix, such that the cross-channel correlation of color channels can be well exploited; second, LRQA imposes the low-rank constraint on the constructed quaternion matrix. To better estimate the singular values of the underlying low-rank quaternion matrix from its noisy observation, a general model for LRQA is proposed based on several nonconvex functions. Extensive evaluations for color image denoising and inpainting tasks verify that LRQA achieves better performance over several state-of-the-art sparse representation and LRMA-based methods in terms of both quantitative metrics and visual quality.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62587764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-16DOI: 10.1109/TIP.2019.2940414
Jun-Sang Yoo, Jong-Ok Kim
Noisy image super-resolution (SR) is a significant challenging process due to the smoothness caused by denoising. Iterative back-projection (IBP) can be helpful in further enhancing the reconstructed SR image, but there is no clean reference image available. This paper proposes a novel back-projection algorithm for noisy image SR. Its main goal is to pursuit the consistency between LR and SR images. We aim to estimate the clean reconstruction error to be back-projected, using the noisy and denoised reconstruction errors. We formulate a new cost function on the principal component analysis (PCA) transform domain to estimate the clean reconstruction error. In the data term of the cost function, noisy and denoised reconstruction errors are combined in a region-adaptive manner using texture probability. In addition, the sparsity constraint is incorporated into the regularization term, based on the Laplacian characteristics of the reconstruction error. Finally, we propose an eigenvector estimation method to minimize the effect of noise. The experimental results demonstrate that the proposed method can perform back-projection in a more noise-robust manner than the conventional IBP, and harmoniously work with any other SR methods as a post-processing.
噪点图像超分辨率(SR)是一个极具挑战性的过程,因为去噪会导致图像不平滑。迭代反投影(IBP)有助于进一步增强重建的 SR 图像,但没有干净的参考图像可用。本文提出了一种用于噪声图像 SR 的新型反投影算法。其主要目标是追求 LR 和 SR 图像之间的一致性。我们的目标是利用噪声和去噪重建误差来估计待反投影的干净重建误差。我们在主成分分析(PCA)变换域上制定了一个新的代价函数来估计干净的重建误差。在成本函数的数据项中,利用纹理概率以区域自适应的方式将噪声和去噪重建误差结合起来。此外,根据重建误差的拉普拉斯特性,将稀疏性约束纳入正则化项。最后,我们提出了一种特征向量估计方法,以最大限度地减少噪声的影响。实验结果表明,与传统的 IBP 方法相比,所提出的方法能以更低噪声的方式进行反向投影,并能作为后处理与其他任何 SR 方法协调工作。
{"title":"Noise-Robust Iterative Back-Projection.","authors":"Jun-Sang Yoo, Jong-Ok Kim","doi":"10.1109/TIP.2019.2940414","DOIUrl":"10.1109/TIP.2019.2940414","url":null,"abstract":"<p><p>Noisy image super-resolution (SR) is a significant challenging process due to the smoothness caused by denoising. Iterative back-projection (IBP) can be helpful in further enhancing the reconstructed SR image, but there is no clean reference image available. This paper proposes a novel back-projection algorithm for noisy image SR. Its main goal is to pursuit the consistency between LR and SR images. We aim to estimate the clean reconstruction error to be back-projected, using the noisy and denoised reconstruction errors. We formulate a new cost function on the principal component analysis (PCA) transform domain to estimate the clean reconstruction error. In the data term of the cost function, noisy and denoised reconstruction errors are combined in a region-adaptive manner using texture probability. In addition, the sparsity constraint is incorporated into the regularization term, based on the Laplacian characteristics of the reconstruction error. Finally, we propose an eigenvector estimation method to minimize the effect of noise. The experimental results demonstrate that the proposed method can perform back-projection in a more noise-robust manner than the conventional IBP, and harmoniously work with any other SR methods as a post-processing.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62587355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-16DOI: 10.1109/TIP.2019.2940690
Qiling Tang, Nong Sang, Haihua Liu
This work develops a biologically inspired neural network for contour detection in natural images by combining the nonclassical receptive field modulation mechanism with a deep learning framework. The input image is first convolved with the local feature detectors to produce the classical receptive field responses, and then a corresponding modulatory kernel is constructed for each feature map to model the nonclassical receptive field modulation behaviors. The modulatory effects can activate a larger cortical area and thus allow cortical neurons to integrate a broader range of visual information to recognize complex cases. Additionally, to characterize spatial structures at various scales, a multiresolution technique is used to represent visual field information from fine to coarse. Different scale responses are combined to estimate the contour probability. Our method achieves state-of-the-art results among all biologically inspired contour detection models. This study provides a method for improving visual modeling of contour detection and inspires new ideas for integrating more brain cognitive mechanisms into deep neural networks.
{"title":"Learning Nonclassical Receptive Field Modulation for Contour Detection.","authors":"Qiling Tang, Nong Sang, Haihua Liu","doi":"10.1109/TIP.2019.2940690","DOIUrl":"10.1109/TIP.2019.2940690","url":null,"abstract":"<p><p>This work develops a biologically inspired neural network for contour detection in natural images by combining the nonclassical receptive field modulation mechanism with a deep learning framework. The input image is first convolved with the local feature detectors to produce the classical receptive field responses, and then a corresponding modulatory kernel is constructed for each feature map to model the nonclassical receptive field modulation behaviors. The modulatory effects can activate a larger cortical area and thus allow cortical neurons to integrate a broader range of visual information to recognize complex cases. Additionally, to characterize spatial structures at various scales, a multiresolution technique is used to represent visual field information from fine to coarse. Different scale responses are combined to estimate the contour probability. Our method achieves state-of-the-art results among all biologically inspired contour detection models. This study provides a method for improving visual modeling of contour detection and inspires new ideas for integrating more brain cognitive mechanisms into deep neural networks.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62587689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-16DOI: 10.1109/TIP.2019.2938877
Wei Liu, Shengcai Liao, Weidong Hu
Though Faster R-CNN based two-stage detectors have witnessed significant boost in pedestrian detection accuracy, they are still slow for practical applications. One solution is to simplify this working flow as a single-stage detector. However, current single-stage detectors (e.g. SSD) have not presented competitive accuracy on common pedestrian detection benchmarks. Accordingly, a structurally simple but effective module called Asymptotic Localization Fitting (ALF) is proposed, which stacks a series of predictors to directly evolve the default anchor boxes of SSD step by step to improve detection results. Additionally, combining the advantages from residual learning and multi-scale context encoding, a bottleneck block is proposed to enhance the predictors' discriminative power. On top of the above designs, an efficient single-stage detection architecture is designed, resulting in an attractive pedestrian detector in both accuracy and speed. A comprehensive set of experiments on two of the largest pedestrian detection datasets (i.e. CityPersons and Caltech) demonstrate the superiority of the proposed method, comparing to the state of the arts on both the benchmarks.
{"title":"Efficient Single-Stage Pedestrian Detector by Asymptotic Localization Fitting and Multi-Scale Context Encoding.","authors":"Wei Liu, Shengcai Liao, Weidong Hu","doi":"10.1109/TIP.2019.2938877","DOIUrl":"10.1109/TIP.2019.2938877","url":null,"abstract":"<p><p>Though Faster R-CNN based two-stage detectors have witnessed significant boost in pedestrian detection accuracy, they are still slow for practical applications. One solution is to simplify this working flow as a single-stage detector. However, current single-stage detectors (e.g. SSD) have not presented competitive accuracy on common pedestrian detection benchmarks. Accordingly, a structurally simple but effective module called Asymptotic Localization Fitting (ALF) is proposed, which stacks a series of predictors to directly evolve the default anchor boxes of SSD step by step to improve detection results. Additionally, combining the advantages from residual learning and multi-scale context encoding, a bottleneck block is proposed to enhance the predictors' discriminative power. On top of the above designs, an efficient single-stage detection architecture is designed, resulting in an attractive pedestrian detector in both accuracy and speed. A comprehensive set of experiments on two of the largest pedestrian detection datasets (i.e. CityPersons and Caltech) demonstrate the superiority of the proposed method, comparing to the state of the arts on both the benchmarks.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-16DOI: 10.1109/TIP.2019.2940693
Hui Cui, Lei Zhu, Jingjing Li, Yang Yang, Liqiang Nie
Recent years have witnessed the wide application of hashing for large-scale image retrieval, because of its high computation efficiency and low storage cost. Particularly, benefiting from current advances in deep learning, supervised deep hashing methods have greatly boosted the retrieval performance, under the strong supervision of large amounts of manually annotated semantic labels. However, their performance is highly dependent upon the supervised labels, which significantly limits the scalability. In contrast, unsupervised deep hashing without label dependence enjoys the advantages of well scalability. Nevertheless, due to the relaxed hash optimization, and more importantly, the lack of semantic guidance, existing methods suffer from limited retrieval performance. In this paper, we propose a SCAlable Deep Hashing (SCADH) to learn enhanced hash codes for social image retrieval. We formulate a unified scalable deep hash learning framework which explores the weak but free supervision of discriminative user tags that are commonly accompanied with social images. It jointly learns image representations and hash functions with deep neural networks, and simultaneously enhances the discriminative capability of image hash codes with the refined semantics from the accompanied social tags. Further, instead of simple relaxed hash optimization, we propose a discrete hash optimization method based on Augmented Lagrangian Multiplier to directly solve the hash codes and avoid the binary quantization information loss. Experiments on two standard social image datasets demonstrate the superiority of the proposed approach compared with stateof- the-art shallow and deep hashing techniques.
近年来,由于哈希算法具有计算效率高、存储成本低的特点,被广泛应用于大规模图像检索。特别是受益于当前深度学习的进步,在大量人工标注语义标签的有力监督下,有监督的深度散列方法大大提高了检索性能。然而,这些方法的性能高度依赖于监督标签,这大大限制了其可扩展性。相比之下,不依赖标签的无监督深度散列具有良好的可扩展性优势。然而,由于放松了哈希优化,更重要的是缺乏语义指导,现有方法的检索性能有限。在本文中,我们提出了一种 SCAlable Deep Hashing(SCADH)来学习增强的哈希码,用于社交图像检索。我们制定了一个统一的可扩展深度哈希学习框架,该框架探索了通常与社交图像一起出现的用户标签的微弱但自由的鉴别监督。它利用深度神经网络联合学习图像表征和哈希函数,同时利用所附社交标签的精炼语义增强图像哈希代码的判别能力。此外,我们还提出了一种基于增强拉格朗日乘法器的离散哈希优化方法,取代简单的松弛哈希优化,直接求解哈希码,避免了二进制量化信息损失。在两个标准社交图像数据集上的实验证明,与最先进的浅散列和深散列技术相比,所提出的方法更具优势。
{"title":"Scalable Deep Hashing for Large-scale Social Image Retrieval.","authors":"Hui Cui, Lei Zhu, Jingjing Li, Yang Yang, Liqiang Nie","doi":"10.1109/TIP.2019.2940693","DOIUrl":"10.1109/TIP.2019.2940693","url":null,"abstract":"<p><p>Recent years have witnessed the wide application of hashing for large-scale image retrieval, because of its high computation efficiency and low storage cost. Particularly, benefiting from current advances in deep learning, supervised deep hashing methods have greatly boosted the retrieval performance, under the strong supervision of large amounts of manually annotated semantic labels. However, their performance is highly dependent upon the supervised labels, which significantly limits the scalability. In contrast, unsupervised deep hashing without label dependence enjoys the advantages of well scalability. Nevertheless, due to the relaxed hash optimization, and more importantly, the lack of semantic guidance, existing methods suffer from limited retrieval performance. In this paper, we propose a SCAlable Deep Hashing (SCADH) to learn enhanced hash codes for social image retrieval. We formulate a unified scalable deep hash learning framework which explores the weak but free supervision of discriminative user tags that are commonly accompanied with social images. It jointly learns image representations and hash functions with deep neural networks, and simultaneously enhances the discriminative capability of image hash codes with the refined semantics from the accompanied social tags. Further, instead of simple relaxed hash optimization, we propose a discrete hash optimization method based on Augmented Lagrangian Multiplier to directly solve the hash codes and avoid the binary quantization information loss. Experiments on two standard social image datasets demonstrate the superiority of the proposed approach compared with stateof- the-art shallow and deep hashing techniques.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62587873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-16DOI: 10.1109/TIP.2019.2940686
Eric Jardim, Lucas A Thomaz, Eduardo A B da Silva, Sergio L Netto
This paper presents a special matrix factorization based on sparse representation that detects anomalies in video sequences generated with moving cameras. Such representation is made by associating the frames of the target video, that is a sequence to be tested for the presence of anomalies, with the frames of an anomaly-free reference video, which is a previously validated sequence. This factorization is done by a sparse coefficient matrix, and any target-video anomaly is encapsulated into a residue term. In order to cope with camera trepidations, domaintransformations are incorporated into the sparse representation process. Approximations of the transformed-domain optimization problem are introduced to turn it into a feasible iterative process. Results obtained from a comprehensive video database acquired with moving cameras on a visually cluttered environment indicate that the proposed algorithm provides a better geometric registration between reference and target videos, greatly improving the overall performance of the anomaly-detection system.
{"title":"Domain-Transformable Sparse Representation for Anomaly Detection in Moving-Camera Videos.","authors":"Eric Jardim, Lucas A Thomaz, Eduardo A B da Silva, Sergio L Netto","doi":"10.1109/TIP.2019.2940686","DOIUrl":"10.1109/TIP.2019.2940686","url":null,"abstract":"<p><p>This paper presents a special matrix factorization based on sparse representation that detects anomalies in video sequences generated with moving cameras. Such representation is made by associating the frames of the target video, that is a sequence to be tested for the presence of anomalies, with the frames of an anomaly-free reference video, which is a previously validated sequence. This factorization is done by a sparse coefficient matrix, and any target-video anomaly is encapsulated into a residue term. In order to cope with camera trepidations, domaintransformations are incorporated into the sparse representation process. Approximations of the transformed-domain optimization problem are introduced to turn it into a feasible iterative process. Results obtained from a comprehensive video database acquired with moving cameras on a visually cluttered environment indicate that the proposed algorithm provides a better geometric registration between reference and target videos, greatly improving the overall performance of the anomaly-detection system.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62587136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-12DOI: 10.1109/TIP.2019.2939946
Wei Wang, Zhengguo Li, Shiqian Wu, Liangcai Zeng
It is challenging to convert a hazy color image into a gray-scale image because the color contrast field of a hazy image is distorted. In this paper, a novel decolorization algorithm is proposed to transfer a hazy image into a distortionrecovered gray-scale image. To recover the color contrast field, the relationship between the restored color contrast and its distorted input is presented in CIELab color space. Based on this restoration, a nonlinear optimization problem is formulated to construct the resultant gray-scale image. A new differentiable approximation solution is introduced to solve this problem with an extension of the Huber loss function. Experimental results show that the proposed algorithm effectively preserves the global luminance consistency while represents the original color contrast in gray-scales, which is very close to the corresponding ground truth gray-scale one.
{"title":"Hazy Image Decolorization with Color Contrast Restoration.","authors":"Wei Wang, Zhengguo Li, Shiqian Wu, Liangcai Zeng","doi":"10.1109/TIP.2019.2939946","DOIUrl":"10.1109/TIP.2019.2939946","url":null,"abstract":"<p><p>It is challenging to convert a hazy color image into a gray-scale image because the color contrast field of a hazy image is distorted. In this paper, a novel decolorization algorithm is proposed to transfer a hazy image into a distortionrecovered gray-scale image. To recover the color contrast field, the relationship between the restored color contrast and its distorted input is presented in CIELab color space. Based on this restoration, a nonlinear optimization problem is formulated to construct the resultant gray-scale image. A new differentiable approximation solution is introduced to solve this problem with an extension of the Huber loss function. Experimental results show that the proposed algorithm effectively preserves the global luminance consistency while represents the original color contrast in gray-scales, which is very close to the corresponding ground truth gray-scale one.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}