IEEE Transactions on Image Processing最新文献_第8页

An Efficient Algorithm for the Piecewise Affine-Linear Mumford-Shah Model Based on a Taylor Jet Splitting. 基于泰勒射流分割的分段式 Affine-Linear 芒福德-沙模型的高效算法

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-08-28 DOI: 10.1109/TIP.2019.2937040

Lukas Kiefer, Martin Storath, Andreas Weinmann

We propose an algorithm to efficiently compute approximate solutions of the piecewise affine Mumford-Shah model. The algorithm is based on a novel reformulation of the underlying optimization problem in terms of Taylor jets. A splitting approach leads to linewise segmented jet estimation problems for which we propose an exact and efficient solver. The proposed method has the combined advantages of prior algorithms: it directly yields a partition, it does not need an initialization procedure, and it is highly parallelizable. The experiments show that the algorithm has lower computation times and that the solutions often have lower functional values than the state-of-the-art.

我们提出了一种高效计算片断仿射 Mumford-Shah 模型近似解的算法。该算法基于以泰勒射流为基础的基本优化问题的一种新的重新表述。分割方法会导致线性分段喷流估计问题，我们为此提出了一种精确高效的求解器。我们提出的方法具有先前算法的综合优势：它能直接生成分区，不需要初始化程序，而且可高度并行化。实验表明，该算法的计算时间更短，求解的函数值往往低于最先进的算法。

引用次数: 0

Neural Compatibility Modeling with Probabilistic Knowledge Distillation. 利用概率知识蒸馏建立神经兼容性模型。

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-08-27 DOI: 10.1109/TIP.2019.2936742

Xianjing Han, Xuemeng Song, Yiyang Yao, Xin-Shun Xu, Liqiang Nie

In modern society, clothing matching plays a pivotal role in people's daily life, as suitable outfits can beautify their appearance directly. Nevertheless, how to make a suitable outfit has become a daily headache for many people, especially those who do not have much sense of aesthetics. In the light of this, many research efforts have been dedicated to the task of complementary clothing matching and have achieved great success relying on the advanced data-driven neural networks. However, most existing methods overlook the rich valuable knowledge accumulated by our human beings in the fashion domain, especially the rules regarding clothing matching, like "coats go with dresses" and "silk tops cannot go with chiffon bottoms". Towards this end, in this work, we propose a knowledge-guided neural compatibility modeling scheme, which is able to incorporate the rich fashion domain knowledge to enhance the performance of the compatibility modeling in the context of clothing matching. To better integrate the huge and implicit fashion domain knowledge into the data-driven neural networks, we present a probabilistic knowledge distillation (PKD) method, which is able to encode vast knowledge rules in a probabilistic manner. Extensive experiments on two real-world datasets have verified the guidance of rules from different sources and demonstrated the effectiveness and portability of our model. As a byproduct, we released the codes and involved parameters to benefit the research community.

在现代社会，服装搭配在人们的日常生活中起着举足轻重的作用，因为合适的服装可以直接美化人们的外表。然而，如何搭配出一套合适的服装却成了许多人，尤其是那些缺乏审美意识的人每天头疼的问题。有鉴于此，许多研究人员都致力于服装搭配的补充工作，并依靠先进的数据驱动神经网络取得了巨大成功。然而，大多数现有方法都忽略了人类在时尚领域积累的丰富宝贵知识，尤其是有关服装搭配的规则，如 "大衣搭配连衣裙 "和 "丝质上衣不能搭配雪纺下装"。为此，我们在这项工作中提出了一种知识引导的神经兼容性建模方案，该方案能够结合丰富的时尚领域知识来提高服装搭配中的兼容性建模性能。为了更好地将庞大而隐含的时尚领域知识整合到数据驱动的神经网络中，我们提出了一种概率知识提炼（PKD）方法，它能够以概率的方式对庞大的知识规则进行编码。在两个真实世界数据集上进行的广泛实验验证了不同来源规则的指导性，并证明了我们模型的有效性和可移植性。作为副产品，我们发布了代码和相关参数，以造福研究界。

{"title":"Neural Compatibility Modeling with Probabilistic Knowledge Distillation.","authors":"Xianjing Han, Xuemeng Song, Yiyang Yao, Xin-Shun Xu, Liqiang Nie","doi":"10.1109/TIP.2019.2936742","DOIUrl":"10.1109/TIP.2019.2936742","url":null,"abstract":"In modern society, clothing matching plays a pivotal role in people's daily life, as suitable outfits can beautify their appearance directly. Nevertheless, how to make a suitable outfit has become a daily headache for many people, especially those who do not have much sense of aesthetics. In the light of this, many research efforts have been dedicated to the task of complementary clothing matching and have achieved great success relying on the advanced data-driven neural networks. However, most existing methods overlook the rich valuable knowledge accumulated by our human beings in the fashion domain, especially the rules regarding clothing matching, like \"coats go with dresses\" and \"silk tops cannot go with chiffon bottoms\". Towards this end, in this work, we propose a knowledge-guided neural compatibility modeling scheme, which is able to incorporate the rich fashion domain knowledge to enhance the performance of the compatibility modeling in the context of clothing matching. To better integrate the huge and implicit fashion domain knowledge into the data-driven neural networks, we present a probabilistic knowledge distillation (PKD) method, which is able to encode vast knowledge rules in a probabilistic manner. Extensive experiments on two real-world datasets have verified the guidance of rules from different sources and demonstrated the effectiveness and portability of our model. As a byproduct, we released the codes and involved parameters to benefit the research community.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62585842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

3D Point Cloud Attribute Compression Using Geometry-Guided Sparse Representation. 利用几何引导的稀疏表示压缩三维点云属性

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-08-27 DOI: 10.1109/TIP.2019.2936738

Shuai Gu, Junhui Hou, Huanqiang Zeng, Hui Yuan, Kai-Kuang Ma

3D point clouds associated with attributes are considered as a promising paradigm for immersive communication. However, the corresponding compression schemes for this media are still in the infant stage. Moreover, in contrast to conventional image/video compression, it is a more challenging task to compress 3D point cloud data, arising from the irregular structure. In this paper, we propose a novel and effective compression scheme for the attributes of voxelized 3D point clouds. In the first stage, an input voxelized 3D point cloud is divided into blocks of equal size. Then, to deal with the irregular structure of 3D point clouds, a geometry-guided sparse representation (GSR) is proposed to eliminate the redundancy within each block, which is formulated as an ℓ0-norm regularized optimization problem. Also, an inter-block prediction scheme is applied to remove the redundancy between blocks. Finally, by quantitatively analyzing the characteristics of the resulting transform coefficients by GSR, an effective entropy coding strategy that is tailored to our GSR is developed to generate the bitstream. Experimental results over various benchmark datasets show that the proposed compression scheme is able to achieve better rate-distortion performance and visual quality, compared with state-of-the-art methods.

与属性相关的三维点云被认为是身临其境通信的一种有前途的范例。然而，针对这种媒体的相应压缩方案仍处于起步阶段。此外，与传统的图像/视频压缩相比，由于三维点云的结构不规则，压缩三维点云数据是一项更具挑战性的任务。在本文中，我们针对体素化三维点云的属性提出了一种新颖有效的压缩方案。在第一阶段，输入的体素化三维点云被分割成大小相等的块。然后，针对三维点云的不规则结构，提出了一种几何引导稀疏表示法（GSR）来消除每个块内的冗余，并将其表述为一个 ℓ0-norm 正则化优化问题。此外，还采用了一种块间预测方案来消除块之间的冗余。最后，通过定量分析 GSR 所产生的变换系数的特征，开发出一种适合我们的 GSR 的有效熵编码策略来生成比特流。各种基准数据集的实验结果表明，与最先进的方法相比，所提出的压缩方案能够获得更好的速率-失真性能和视觉质量。

{"title":"3D Point Cloud Attribute Compression Using Geometry-Guided Sparse Representation.","authors":"Shuai Gu, Junhui Hou, Huanqiang Zeng, Hui Yuan, Kai-Kuang Ma","doi":"10.1109/TIP.2019.2936738","DOIUrl":"10.1109/TIP.2019.2936738","url":null,"abstract":"3D point clouds associated with attributes are considered as a promising paradigm for immersive communication. However, the corresponding compression schemes for this media are still in the infant stage. Moreover, in contrast to conventional image/video compression, it is a more challenging task to compress 3D point cloud data, arising from the irregular structure. In this paper, we propose a novel and effective compression scheme for the attributes of voxelized 3D point clouds. In the first stage, an input voxelized 3D point cloud is divided into blocks of equal size. Then, to deal with the irregular structure of 3D point clouds, a geometry-guided sparse representation (GSR) is proposed to eliminate the redundancy within each block, which is formulated as an ℓ0-norm regularized optimization problem. Also, an inter-block prediction scheme is applied to remove the redundancy between blocks. Finally, by quantitatively analyzing the characteristics of the resulting transform coefficients by GSR, an effective entropy coding strategy that is tailored to our GSR is developed to generate the bitstream. Experimental results over various benchmark datasets show that the proposed compression scheme is able to achieve better rate-distortion performance and visual quality, compared with state-of-the-art methods.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62585746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Video Saliency Prediction using Spatiotemporal Residual Attentive Networks. 利用时空残留注意力网络进行视频显著性预测

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-08-23 DOI: 10.1109/TIP.2019.2936112

Qiuxia Lai, Wenguan Wang, Hanqiu Sun, Jianbing Shen

This paper proposes a novel residual attentive learning network architecture for predicting dynamic eye-fixation maps. The proposed model emphasizes two essential issues, i.e, effective spatiotemporal feature integration and multi-scale saliency learning. For the first problem, appearance and motion streams are tightly coupled via dense residual cross connections, which integrate appearance information with multi-layer, comprehensive motion features in a residual and dense way. Beyond traditional two-stream models learning appearance and motion features separately, such design allows early, multi-path information exchange between different domains, leading to a unified and powerful spatiotemporal learning architecture. For the second one, we propose a composite attention mechanism that learns multi-scale local attentions and global attention priors end-to-end. It is used for enhancing the fused spatiotemporal features via emphasizing important features in multi-scales. A lightweight convolutional Gated Recurrent Unit (convGRU), which is flexible for small training data situation, is used for long-term temporal characteristics modeling. Extensive experiments over four benchmark datasets clearly demonstrate the advantage of the proposed video saliency model over other competitors and the effectiveness of each component of our network. Our code and all the results will be available at https://github.com/ashleylqx/STRA-Net.

本文提出了一种新颖的残差注意力学习网络架构，用于预测动态眼动图。所提出的模型强调两个基本问题，即有效的时空特征整合和多尺度显著性学习。对于第一个问题，外观流和运动流通过密集的残差交叉连接紧密耦合，以残差和密集的方式将外观信息与多层综合运动特征整合在一起。与传统的分别学习外观和运动特征的双流模型相比，这种设计允许在不同领域之间进行早期、多路径的信息交换，从而形成统一而强大的时空学习架构。其次，我们提出了一种复合注意力机制，可以端到端学习多尺度局部注意力和全局注意力先验。它通过强调多尺度的重要特征来增强融合的时空特征。轻量级卷积门控递归单元（convGRU）可灵活应对训练数据较少的情况，被用于长期时空特征建模。在四个基准数据集上进行的广泛实验清楚地证明了所提出的视频显著性模型相对于其他竞争对手的优势，以及我们网络中每个组件的有效性。我们的代码和所有结果将公布在 https://github.com/ashleylqx/STRA-Net 网站上。

{"title":"Video Saliency Prediction using Spatiotemporal Residual Attentive Networks.","authors":"Qiuxia Lai, Wenguan Wang, Hanqiu Sun, Jianbing Shen","doi":"10.1109/TIP.2019.2936112","DOIUrl":"10.1109/TIP.2019.2936112","url":null,"abstract":"This paper proposes a novel residual attentive learning network architecture for predicting dynamic eye-fixation maps. The proposed model emphasizes two essential issues, i.e, effective spatiotemporal feature integration and multi-scale saliency learning. For the first problem, appearance and motion streams are tightly coupled via dense residual cross connections, which integrate appearance information with multi-layer, comprehensive motion features in a residual and dense way. Beyond traditional two-stream models learning appearance and motion features separately, such design allows early, multi-path information exchange between different domains, leading to a unified and powerful spatiotemporal learning architecture. For the second one, we propose a composite attention mechanism that learns multi-scale local attentions and global attention priors end-to-end. It is used for enhancing the fused spatiotemporal features via emphasizing important features in multi-scales. A lightweight convolutional Gated Recurrent Unit (convGRU), which is flexible for small training data situation, is used for long-term temporal characteristics modeling. Extensive experiments over four benchmark datasets clearly demonstrate the advantage of the proposed video saliency model over other competitors and the effectiveness of each component of our network. Our code and all the results will be available at https://github.com/ashleylqx/STRA-Net.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62585405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Category-Aware Spatial Constraint for Weakly Supervised Detection. 弱监督检测的类别感知空间约束

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-08-22 DOI: 10.1109/TIP.2019.2933735

Yunhang Shen, Rongrong Ji, Kuiyuan Yang, Cheng Deng, Changhu Wang

Weakly supervised object detection has attracted increasing research attention recently. To this end, most existing schemes rely on scoring category-independent region proposals, which is formulated as a multiple instance learning problem. During this process, the proposal scores are aggregated and supervised by only image-level labels, which often fails to locate object boundaries precisely. In this paper, we break through such a restriction by taking a deeper look into the score aggregation stage and propose a Category-aware Spatial Constraint (CSC) scheme for proposals, which is integrated into weakly supervised object detection in an end-to-end learning manner. In particular, we incorporate the global shape information of objects as an unsupervised constraint, which is inferred from build-in foreground-and-background cues, termed Category-specific Pixel Gradient (CPG) maps. Specifically, each region proposal is weighted according to how well it covers the estimated shape of objects. For each category, a multi-center regularization is further introduced to penalize the violations between centers cluster and high-score proposals in a given image. Extensive experiments are done on the most widely-used benchmark Pascal VOC and COCO, which shows that our approach significantly improves weakly supervised object detection without adding new learnable parameters to the existing models nor changing the structures of CNNs.

近来，弱监督物体检测引起了越来越多的研究关注。为此，大多数现有方案都依赖于对与类别无关的区域建议进行评分，这被表述为一个多实例学习问题。在此过程中，建议得分仅由图像级标签进行汇总和监督，这往往无法精确定位物体边界。在本文中，我们突破了这一限制，深入研究了分数聚合阶段，并提出了针对建议的类别感知空间约束（CSC）方案，该方案以端到端的学习方式集成到弱监督对象检测中。特别是，我们将物体的全局形状信息作为一种无监督约束，从内置的前景和背景线索（称为特定类别像素梯度（CPG）图）中推断出来。具体来说，每个区域建议的权重取决于它对估计物体形状的覆盖程度。对于每个类别，还进一步引入了多中心正则化，以惩罚给定图像中中心集群和高分建议之间的违规行为。我们在最广泛使用的基准 Pascal VOC 和 COCO 上进行了大量实验，结果表明，我们的方法显著改善了弱监督物体检测，既没有为现有模型添加新的可学习参数，也没有改变 CNN 的结构。

{"title":"Category-Aware Spatial Constraint for Weakly Supervised Detection.","authors":"Yunhang Shen, Rongrong Ji, Kuiyuan Yang, Cheng Deng, Changhu Wang","doi":"10.1109/TIP.2019.2933735","DOIUrl":"10.1109/TIP.2019.2933735","url":null,"abstract":"Weakly supervised object detection has attracted increasing research attention recently. To this end, most existing schemes rely on scoring category-independent region proposals, which is formulated as a multiple instance learning problem. During this process, the proposal scores are aggregated and supervised by only image-level labels, which often fails to locate object boundaries precisely. In this paper, we break through such a restriction by taking a deeper look into the score aggregation stage and propose a Category-aware Spatial Constraint (CSC) scheme for proposals, which is integrated into weakly supervised object detection in an end-to-end learning manner. In particular, we incorporate the global shape information of objects as an unsupervised constraint, which is inferred from build-in foreground-and-background cues, termed Category-specific Pixel Gradient (CPG) maps. Specifically, each region proposal is weighted according to how well it covers the estimated shape of objects. For each category, a multi-center regularization is further introduced to penalize the violations between centers cluster and high-score proposals in a given image. Extensive experiments are done on the most widely-used benchmark Pascal VOC and COCO, which shows that our approach significantly improves weakly supervised object detection without adding new learnable parameters to the existing models nor changing the structures of CNNs.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62584864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Novel Key-point Detector based on Sparse Coding. 基于稀疏编码的新型关键点检测器

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-08-19 DOI: 10.1109/TIP.2019.2934891

Thanh Hong-Phuoc, Ling Guan

Most popular hand-crafted key-point detectors such as Harris corner, MSER, SIFT, SURF rely on some specific pre-designed structures for detection of corners, blobs, or junctions in an image. The very nature of pre-designed structures can be considered a source of inflexibility for these detectors in different contexts. Additionally, the performance of these detectors is also highly affected by non-uniform change in illumination. To the best of our knowledge, while there are some previous works addressing one of the two aforementioned problems, there currently lacks an efficient method to solve both simultaneously. In this paper, we propose a novel Sparse Coding based Key-point detector (SCK) which is fully invariant to affine intensity change and independent of any particular structure. The proposed detector locates a key-point in an image, based on a complexity measure calculated from the block surrounding its position. A strength measure is also proposed for comparing and selecting the detected key-points when the maximum number of key-points is limited. In this paper, the desirable characteristics of the proposed detector are theoretically confirmed. Experimental results on three public datasets also show that the proposed detector achieves significantly high performance in terms of repeatability and matching score.

大多数流行的手工制作的关键点检测器，如 Harris corner、MSER、SIFT、SURF 等，都依赖于一些特定的预先设计的结构来检测图像中的角落、斑点或交界处。预先设计结构的本质可以说是这些检测器在不同情况下缺乏灵活性的根源。此外，光照的不均匀变化也会严重影响这些检测器的性能。据我们所知，虽然之前有一些作品解决了上述两个问题中的一个，但目前还缺乏同时解决这两个问题的有效方法。在本文中，我们提出了一种新颖的基于稀疏编码的关键点检测器（SCK），它完全不受仿射强度变化的影响，也不受任何特定结构的影响。所提出的检测器根据图像中关键点位置周围区块计算出的复杂度来定位关键点。本文还提出了一种强度测量方法，用于在关键点数量有限的情况下比较和选择检测到的关键点。本文从理论上证实了所提检测器的理想特性。在三个公共数据集上的实验结果也表明，所提出的检测器在重复性和匹配得分方面都取得了显著的高性能。

{"title":"A Novel Key-point Detector based on Sparse Coding.","authors":"Thanh Hong-Phuoc, Ling Guan","doi":"10.1109/TIP.2019.2934891","DOIUrl":"10.1109/TIP.2019.2934891","url":null,"abstract":"Most popular hand-crafted key-point detectors such as Harris corner, MSER, SIFT, SURF rely on some specific pre-designed structures for detection of corners, blobs, or junctions in an image. The very nature of pre-designed structures can be considered a source of inflexibility for these detectors in different contexts. Additionally, the performance of these detectors is also highly affected by non-uniform change in illumination. To the best of our knowledge, while there are some previous works addressing one of the two aforementioned problems, there currently lacks an efficient method to solve both simultaneously. In this paper, we propose a novel Sparse Coding based Key-point detector (SCK) which is fully invariant to affine intensity change and independent of any particular structure. The proposed detector locates a key-point in an image, based on a complexity measure calculated from the block surrounding its position. A strength measure is also proposed for comparing and selecting the detected key-points when the maximum number of key-points is limited. In this paper, the desirable characteristics of the proposed detector are theoretically confirmed. Experimental results on three public datasets also show that the proposed detector achieves significantly high performance in terms of repeatability and matching score.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62585479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semi-Supervised Deep Coupled Ensemble Learning with Classification Landmark Exploration. 半监督深度耦合集合学习与分类地标探索

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-08-13 DOI: 10.1109/TIP.2019.2933724

Jichang Li, Si Wu, Cheng Liu, Zhiwen Yu, Hau-San Wong

Using an ensemble of neural networks with consistency regularization is effective for improving performance and stability of deep learning, compared to the case of a single network. In this paper, we present a semi-supervised Deep Coupled Ensemble (DCE) model, which contributes to ensemble learning and classification landmark exploration for better locating the final decision boundaries in the learnt latent space. First, multiple complementary consistency regularizations are integrated into our DCE model to enable the ensemble members to learn from each other and themselves, such that training experience from different sources can be shared and utilized during training. Second, in view of the possibility of producing incorrect predictions on a number of difficult instances, we adopt class-wise mean feature matching to explore important unlabeled instances as classification landmarks, on which the model predictions are more reliable. Minimizing the weighted conditional entropy on unlabeled data is able to force the final decision boundaries to move away from important training data points, which facilitates semi-supervised learning. Ensemble members could eventually have similar performance due to consistency regularization, and thus only one of these members is needed during the test stage, such that the efficiency of our model is the same as the non-ensemble case. Extensive experimental results demonstrate the superiority of our proposed DCE model over existing state-of-the-art semi-supervised learning methods.

与单个网络相比，使用具有一致性正则化的神经网络集合能有效提高深度学习的性能和稳定性。在本文中，我们提出了一种半监督深度耦合合集（DCE）模型，它有助于合集学习和分类地标探索，从而更好地定位所学潜空间中的最终决策边界。首先，我们的 DCE 模型集成了多种互补的一致性正则化，使合奏成员能够相互学习和自我学习，从而在训练过程中共享和利用来自不同来源的训练经验。其次，考虑到在一些困难的实例上可能会产生错误的预测，我们采用了类均值特征匹配来探索重要的未标记实例作为分类地标，在这些地标上，模型的预测会更加可靠。最小化未标注数据的加权条件熵能够迫使最终决策边界远离重要的训练数据点，从而促进半监督学习。由于一致性正则化，合集成员最终可能具有相似的性能，因此在测试阶段只需要其中一个成员，这样我们模型的效率与非合集情况相同。广泛的实验结果表明，我们提出的 DCE 模型优于现有的最先进的半监督学习方法。

{"title":"Semi-Supervised Deep Coupled Ensemble Learning with Classification Landmark Exploration.","authors":"Jichang Li, Si Wu, Cheng Liu, Zhiwen Yu, Hau-San Wong","doi":"10.1109/TIP.2019.2933724","DOIUrl":"10.1109/TIP.2019.2933724","url":null,"abstract":"Using an ensemble of neural networks with consistency regularization is effective for improving performance and stability of deep learning, compared to the case of a single network. In this paper, we present a semi-supervised Deep Coupled Ensemble (DCE) model, which contributes to ensemble learning and classification landmark exploration for better locating the final decision boundaries in the learnt latent space. First, multiple complementary consistency regularizations are integrated into our DCE model to enable the ensemble members to learn from each other and themselves, such that training experience from different sources can be shared and utilized during training. Second, in view of the possibility of producing incorrect predictions on a number of difficult instances, we adopt class-wise mean feature matching to explore important unlabeled instances as classification landmarks, on which the model predictions are more reliable. Minimizing the weighted conditional entropy on unlabeled data is able to force the final decision boundaries to move away from important training data points, which facilitates semi-supervised learning. Ensemble members could eventually have similar performance due to consistency regularization, and thus only one of these members is needed during the test stage, such that the efficiency of our model is the same as the non-ensemble case. Extensive experimental results demonstrate the superiority of our proposed DCE model over existing state-of-the-art semi-supervised learning methods.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62584761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Learning based Picture-Wise Just Noticeable Distortion Prediction Model for Image Compression. 基于深度学习的图像压缩 "画中画 "畸变预测模型

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-08-13 DOI: 10.1109/TIP.2019.2933743

Huanhua Liu, Yun Zhang, Huan Zhang, Chunling Fan, Sam Kwong, C-C Jay Kuo, Xiaoping Fan

Picture Wise Just Noticeable Difference (PW-JND), which accounts for the minimum difference of a picture that human visual system can perceive, can be widely used in perception-oriented image and video processing. However, the conventional Just Noticeable Difference (JND) models calculate the JND threshold for each pixel or sub-band separately, which may not reflect the total masking effect of a picture accurately. In this paper, we propose a deep learning based PW-JND prediction model for image compression. Firstly, we formulate the task of predicting PW-JND as a multi-class classification problem, and propose a framework to transform the multi-class classification problem to a binary classification problem solved by just one binary classifier. Secondly, we construct a deep learning based binary classifier named perceptually lossy/lossless predictor which can predict whether an image is perceptually lossy to another or not. Finally, we propose a sliding window based search strategy to predict PW-JND based on the prediction results of the perceptually lossy/lossless predictor. Experimental results show that the mean accuracy of the perceptually lossy/lossless predictor reaches 92%, and the absolute prediction error of the proposed PW-JND model is 0.79 dB on average, which shows the superiority of the proposed PW-JND model to the conventional JND models.

图像明智可察觉差值（PW-JND）是指人类视觉系统能感知到的图像最小差值，可广泛应用于以感知为导向的图像和视频处理中。然而，传统的JND（Just Noticeable Difference）模型分别计算每个像素或子波段的JND阈值，可能无法准确反映图片的总体遮蔽效果。本文提出了一种基于深度学习的 PW-JND 预测模型，用于图像压缩。首先，我们将预测 PW-JND 的任务表述为一个多类分类问题，并提出了一个框架，将多类分类问题转化为仅由一个二进制分类器求解的二进制分类问题。其次，我们构建了一个基于深度学习的二元分类器，名为 "感知有损/无损预测器"，它可以预测一幅图像对另一幅图像来说是否是感知有损的。最后，我们提出了一种基于滑动窗口的搜索策略，根据感知有损/无损预测器的预测结果来预测 PW-JND。实验结果表明，感知有损/无损预测器的平均准确率达到 92%，而所提出的 PW-JND 模型的绝对预测误差平均为 0.79 dB，这表明所提出的 PW-JND 模型优于传统的 JND 模型。

{"title":"Deep Learning based Picture-Wise Just Noticeable Distortion Prediction Model for Image Compression.","authors":"Huanhua Liu, Yun Zhang, Huan Zhang, Chunling Fan, Sam Kwong, C-C Jay Kuo, Xiaoping Fan","doi":"10.1109/TIP.2019.2933743","DOIUrl":"10.1109/TIP.2019.2933743","url":null,"abstract":"Picture Wise Just Noticeable Difference (PW-JND), which accounts for the minimum difference of a picture that human visual system can perceive, can be widely used in perception-oriented image and video processing. However, the conventional Just Noticeable Difference (JND) models calculate the JND threshold for each pixel or sub-band separately, which may not reflect the total masking effect of a picture accurately. In this paper, we propose a deep learning based PW-JND prediction model for image compression. Firstly, we formulate the task of predicting PW-JND as a multi-class classification problem, and propose a framework to transform the multi-class classification problem to a binary classification problem solved by just one binary classifier. Secondly, we construct a deep learning based binary classifier named perceptually lossy/lossless predictor which can predict whether an image is perceptually lossy to another or not. Finally, we propose a sliding window based search strategy to predict PW-JND based on the prediction results of the perceptually lossy/lossless predictor. Experimental results show that the mean accuracy of the perceptually lossy/lossless predictor reaches 92%, and the absolute prediction error of the proposed PW-JND model is 0.79 dB on average, which shows the superiority of the proposed PW-JND model to the conventional JND models.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62584735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multiresolution Localization with Temporal Scanning for Super-Resolution Diffuse Optical Imaging of Fluorescence. 利用时间扫描进行多分辨率定位，实现荧光的超分辨率漫反射光学成像。

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-08-12 DOI: 10.1109/TIP.2019.2931080

Brian Z Bentz, Dergan Lin, Justin A Patel, Kevin J Webb

A super-resolution optical imaging method is presented that relies on the distinct temporal information associated with each fluorescent optical reporter to determine its spatial position to high precision with measurements of heavily scattered light. This multiple-emitter localization approach uses a diffusion equation forward model in a cost function, and has the potential to achieve micron-scale spatial resolution through centimeters of tissue. Utilizing some degree of temporal separation for the reporter emissions, position and emission strength are determined using a computationally efficient time stripping multiresolution algorithm. The approach circumvents the spatial resolution challenges faced by earlier optical imaging approaches using a diffusion equation forward model, and is promising for in vivo applications. For example, in principle, the method could be used to localize individual neurons firing throughout a rodent brain, enabling direct imaging of neural network activity.

本文介绍了一种超分辨率光学成像方法，该方法依靠与每个荧光光学报告体相关的独特时间信息，通过测量大量散射光来高精度地确定报告体的空间位置。这种多发射器定位方法在成本函数中使用了扩散方程前向模型，有可能在几厘米的组织中实现微米级的空间分辨率。利用报告发射的一定程度的时间分离，使用计算效率高的时间剥离多分辨率算法确定位置和发射强度。这种方法规避了早期使用扩散方程前向模型的光学成像方法所面临的空间分辨率挑战，在体内应用方面大有可为。例如，该方法原则上可用于定位整个啮齿动物大脑中发射的单个神经元，从而实现对神经网络活动的直接成像。

引用次数: 0

Summit Navigator: A Novel Approach for Local Maxima Extraction. 高峰导航仪：提取局部最大值的新方法

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-08-12 DOI: 10.1109/TIP.2019.2932501

Tran Hiep Dinh, Manh Duong Phung, Quang Phuc Ha

This paper presents a novel method, called the Summit Navigator, to effectively extract local maxima of an image histogram for multi-object segmentation of images. After smoothing with a moving average filter, the obtained histogram is analyzed, based on the data density and distribution to find the best observing location. An observability index for each initial peak is proposed to evaluate if it can be considered as dominant by using the calculated observing location. Recursive algorithms are then developed for peak searching and merging to remove any false detection of peaks that are located on one side of each mode. Experimental results demonstrated the advantages of the proposed approach in terms of accuracy and consistency in different reputable datasets.

本文提出了一种名为 "Summit Navigator "的新方法，可有效提取图像直方图的局部最大值，用于图像的多目标分割。在使用移动平均滤波器进行平滑处理后，根据数据密度和分布情况对获得的直方图进行分析，以找到最佳观测位置。我们为每个初始峰值提出了一个可观察性指数，以通过计算出的观察位置来评估是否可将其视为主要峰值。然后开发了用于峰值搜索和合并的递归算法，以消除对位于每个模式一侧的峰值的错误检测。实验结果表明，所提出的方法在不同声誉数据集的准确性和一致性方面具有优势。

引用次数: 0