2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献_第5页

Circulant Binary Convolutional Networks: Enhancing the Performance of 1-Bit DCNNs With Circulant Back Propagation 循环二进制卷积网络:利用循环反向传播增强1位DCNNs的性能

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00280

Chunlei Liu, Wenrui Ding, Xin Xia, Baochang Zhang, Jiaxin Gu, Jianzhuang Liu, Rongrong Ji, D. Doermann

The rapidly decreasing computation and memory cost has recently driven the success of many applications in the field of deep learning. Practical applications of deep learning in resource-limited hardware, such as embedded devices and smart phones, however, remain challenging. For binary convolutional networks, the reason lies in the degraded representation caused by binarizing full-precision filters. To address this problem, we propose new circulant filters (CiFs) and a circulant binary convolution (CBConv) to enhance the capacity of binarized convolutional features via our circulant back propagation (CBP). The CiFs can be easily incorporated into existing deep convolutional neural networks (DCNNs), which leads to new Circulant Binary Convolutional Networks (CBCNs). Extensive experiments confirm that the performance gap between the 1-bit and full-precision DCNNs is minimized by increasing the filter diversity, which further increases the representational ability in our networks. Our experiments on ImageNet show that CBCNs achieve 61.4% top-1 accuracy with ResNet18. Compared to the state-of-the-art such as XNOR, CBCNs can achieve up to 10% higher top-1 accuracy with more powerful representational ability.

近年来，快速下降的计算和内存成本推动了深度学习领域许多应用的成功。然而，深度学习在资源有限的硬件(如嵌入式设备和智能手机)中的实际应用仍然具有挑战性。对于二值卷积网络，其原因在于二值化全精度滤波器导致的表示退化。为了解决这个问题，我们提出了新的循环滤波器(CiFs)和循环二进制卷积(CBConv)，通过循环反向传播(CBP)来增强二值化卷积特征的容量。它们可以很容易地整合到现有的深度卷积神经网络(DCNNs)中，从而形成新的循环二进制卷积网络(CBCNs)。大量的实验证实，通过增加滤波器分集，可以最小化1位和全精度DCNNs之间的性能差距，从而进一步提高网络的表示能力。我们在ImageNet上的实验表明，使用ResNet18的CBCNs达到了61.4%的top-1准确率。与XNOR等最先进的算法相比，CBCNs的top-1精度可以提高10%，并且具有更强大的表示能力。

{"title":"Circulant Binary Convolutional Networks: Enhancing the Performance of 1-Bit DCNNs With Circulant Back Propagation","authors":"Chunlei Liu, Wenrui Ding, Xin Xia, Baochang Zhang, Jiaxin Gu, Jianzhuang Liu, Rongrong Ji, D. Doermann","doi":"10.1109/CVPR.2019.00280","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00280","url":null,"abstract":"The rapidly decreasing computation and memory cost has recently driven the success of many applications in the field of deep learning. Practical applications of deep learning in resource-limited hardware, such as embedded devices and smart phones, however, remain challenging. For binary convolutional networks, the reason lies in the degraded representation caused by binarizing full-precision filters. To address this problem, we propose new circulant filters (CiFs) and a circulant binary convolution (CBConv) to enhance the capacity of binarized convolutional features via our circulant back propagation (CBP). The CiFs can be easily incorporated into existing deep convolutional neural networks (DCNNs), which leads to new Circulant Binary Convolutional Networks (CBCNs). Extensive experiments confirm that the performance gap between the 1-bit and full-precision DCNNs is minimized by increasing the filter diversity, which further increases the representational ability in our networks. Our experiments on ImageNet show that CBCNs achieve 61.4% top-1 accuracy with ResNet18. Compared to the state-of-the-art such as XNOR, CBCNs can achieve up to 10% higher top-1 accuracy with more powerful representational ability.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"14 1","pages":"2686-2694"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89522932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 64

Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation 模式亲和传播跨越深度，表面法线和语义分割

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00423

Zhenyu Zhang, Zhen Cui, Chunyan Xu, Yan Yan, N. Sebe, Jian Yang

In this paper, we propose a novel Pattern-Affinitive Propagation (PAP) framework to jointly predict depth, surface normal and semantic segmentation. The motivation behind it comes from the statistic observation that pattern-affinitive pairs recur much frequently across different tasks as well as within a task. Thus, we can conduct two types of propagations, cross-task propagation and task-specific propagation, to adaptively diffuse those similar patterns. The former integrates cross-task affinity patterns to adapt to each task therein through the calculation on non-local relationships. Next the latter performs an iterative diffusion in the feature space so that the cross-task affinity patterns can be widely-spread within the task. Accordingly, the learning of each task can be regularized and boosted by the complementary task-level affinities. Extensive experiments demonstrate the effectiveness and the superiority of our method on the joint three tasks. Meanwhile, we achieve the state-of-the-art or competitive results on the three related datasets, NYUD-v2, SUN-RGBD and KITTI.

在本文中，我们提出了一种新的模式亲和性传播(PAP)框架来联合预测深度、表面法向和语义分割。其背后的动机来自统计观察，即模式亲和对在不同任务之间以及在一个任务内频繁地重复出现。因此，我们可以进行两种类型的传播，跨任务传播和特定于任务的传播，以自适应地传播这些相似的模式。前者通过对非局部关系的计算，集成了跨任务的关联模式，以适应其中的每个任务。然后，后者在特征空间中执行迭代扩散，以便跨任务关联模式可以在任务中广泛传播。因此，每个任务的学习可以通过互补的任务级亲和力进行规范化和促进。大量的实验证明了该方法在联合三任务上的有效性和优越性。同时，我们在NYUD-v2、SUN-RGBD和KITTI三个相关数据集上取得了最先进或具有竞争力的结果。

{"title":"Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation","authors":"Zhenyu Zhang, Zhen Cui, Chunyan Xu, Yan Yan, N. Sebe, Jian Yang","doi":"10.1109/CVPR.2019.00423","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00423","url":null,"abstract":"In this paper, we propose a novel Pattern-Affinitive Propagation (PAP) framework to jointly predict depth, surface normal and semantic segmentation. The motivation behind it comes from the statistic observation that pattern-affinitive pairs recur much frequently across different tasks as well as within a task. Thus, we can conduct two types of propagations, cross-task propagation and task-specific propagation, to adaptively diffuse those similar patterns. The former integrates cross-task affinity patterns to adapt to each task therein through the calculation on non-local relationships. Next the latter performs an iterative diffusion in the feature space so that the cross-task affinity patterns can be widely-spread within the task. Accordingly, the learning of each task can be regularized and boosted by the complementary task-level affinities. Extensive experiments demonstrate the effectiveness and the superiority of our method on the joint three tasks. Meanwhile, we achieve the state-of-the-art or competitive results on the three related datasets, NYUD-v2, SUN-RGBD and KITTI.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"63 6 1","pages":"4101-4110"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76694877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 209

Adaptive Pyramid Context Network for Semantic Segmentation 语义分割的自适应金字塔上下文网络

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00770

Junjun He, Zhongying Deng, Lei Zhou, Yali Wang, Y. Qiao

Recent studies witnessed that context features can significantly improve the performance of deep semantic segmentation networks. Current context based segmentation methods differ with each other in how to construct context features and perform differently in practice. This paper firstly introduces three desirable properties of context features in segmentation task. Specially, we find that Global-guided Local Affinity (GLA) can play a vital role in constructing effective context features, while this property has been largely ignored in previous works. Based on this analysis, this paper proposes Adaptive Pyramid Context Network (APCNet) for semantic segmentation. APCNet adaptively constructs multi-scale contextual representations with multiple well-designed Adaptive Context Modules (ACMs). Specifically, each ACM leverages a global image representation as a guidance to estimate the local affinity coefficients for each sub-region, and then calculates a context vector with these affinities. We empirically evaluate our APCNet on three semantic segmentation and scene parsing datasets, including PASCAL VOC 2012, Pascal-Context, and ADE20K dataset. Experimental results show that APCNet achieves state-of-the-art performance on all three benchmarks, and obtains a new record 84.2% on PASCAL VOC 2012 test set without MS COCO pre-trained and any post-processing.

近年来的研究表明，上下文特征可以显著提高深度语义分割网络的性能。现有的基于上下文的分割方法在如何构造上下文特征方面存在差异，在实际应用中也表现出不同的效果。本文首先介绍了上下文特征在分割任务中的三个理想属性。特别是，我们发现全局引导的局部亲和性(Global-guided Local Affinity, GLA)在构建有效的上下文特征中发挥着至关重要的作用，而这一特性在以往的研究中被很大程度上忽略了。在此基础上，本文提出了自适应金字塔上下文网络(APCNet)进行语义分割。APCNet通过多个精心设计的自适应上下文模块(Adaptive Context Modules, acm)自适应构建多尺度上下文表示。具体来说，每个ACM利用全局图像表示作为指导来估计每个子区域的局部亲和力系数，然后计算具有这些亲和力的上下文向量。我们在三个语义分割和场景分析数据集上对APCNet进行了实证评估，包括PASCAL VOC 2012、PASCAL - context和ADE20K数据集。实验结果表明，APCNet在所有三个基准测试中都达到了最先进的性能，并且在没有MS COCO预训练和任何后处理的情况下，在PASCAL VOC 2012测试集上获得了84.2%的新记录。

{"title":"Adaptive Pyramid Context Network for Semantic Segmentation","authors":"Junjun He, Zhongying Deng, Lei Zhou, Yali Wang, Y. Qiao","doi":"10.1109/CVPR.2019.00770","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00770","url":null,"abstract":"Recent studies witnessed that context features can significantly improve the performance of deep semantic segmentation networks. Current context based segmentation methods differ with each other in how to construct context features and perform differently in practice. This paper firstly introduces three desirable properties of context features in segmentation task. Specially, we find that Global-guided Local Affinity (GLA) can play a vital role in constructing effective context features, while this property has been largely ignored in previous works. Based on this analysis, this paper proposes Adaptive Pyramid Context Network (APCNet) for semantic segmentation. APCNet adaptively constructs multi-scale contextual representations with multiple well-designed Adaptive Context Modules (ACMs). Specifically, each ACM leverages a global image representation as a guidance to estimate the local affinity coefficients for each sub-region, and then calculates a context vector with these affinities. We empirically evaluate our APCNet on three semantic segmentation and scene parsing datasets, including PASCAL VOC 2012, Pascal-Context, and ADE20K dataset. Experimental results show that APCNet achieves state-of-the-art performance on all three benchmarks, and obtains a new record 84.2% on PASCAL VOC 2012 test set without MS COCO pre-trained and any post-processing.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"21 1","pages":"7511-7520"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75383395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 270

Learning Spatio-Temporal Representation With Local and Global Diffusion 局部和全局扩散下的时空表征学习

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01233

Zhaofan Qiu, Ting Yao, C. Ngo, Xinmei Tian, Tao Mei

Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for visual recognition problems. Nevertheless, the convolutional filters in these networks are local operations while ignoring the large-range dependency. Such drawback becomes even worse particularly for video recognition, since video is an information-intensive media with complex temporal variations. In this paper, we present a novel framework to boost the spatio-temporal representation learning by Local and Global Diffusion (LGD). Specifically, we construct a novel neural network architecture that learns the local and global representations in parallel. The architecture is composed of LGD blocks, where each block updates local and global features by modeling the diffusions between these two representations. Diffusions effectively interact two aspects of information, i.e., localized and holistic, for more powerful way of representation learning. Furthermore, a kernelized classifier is introduced to combine the representations from two aspects for video recognition. Our LGD networks achieve clear improvements on the large-scale Kinetics-400 and Kinetics-600 video classification datasets against the best competitors by 3.5% and 0.7%. We further examine the generalization of both the global and local representations produced by our pre-trained LGD networks on four different benchmarks for video action recognition and spatio-temporal action detection tasks. Superior performances over several state-of-the-art techniques on these benchmarks are reported.

卷积神经网络(CNN)被认为是一类强大的视觉识别模型。然而，这些网络中的卷积滤波器是局部操作，而忽略了大范围的依赖关系。由于视频是一种具有复杂时间变化的信息密集型媒体，因此这种缺陷在视频识别中变得更加严重。在本文中，我们提出了一个新的框架来促进局部和全局扩散(LGD)的时空表征学习。具体来说，我们构建了一种新的神经网络架构，可以并行学习局部和全局表示。该体系结构由LGD块组成，其中每个块通过建模这两种表示之间的扩散来更新局部和全局特征。扩散有效地交互了信息的两个方面，即局部和整体，以获得更强大的表征学习方式。在此基础上，引入了一种核分类器，将两方面的表征结合起来进行视频识别。我们的LGD网络在大型Kinetics-400和Kinetics-600视频分类数据集上实现了明显的改进，比最佳竞争对手分别提高了3.5%和0.7%。我们进一步研究了我们的预训练LGD网络在视频动作识别和时空动作检测任务的四个不同基准上产生的全局和局部表示的泛化。在这些基准测试中，报告了几种最先进的技术的优越性能。

{"title":"Learning Spatio-Temporal Representation With Local and Global Diffusion","authors":"Zhaofan Qiu, Ting Yao, C. Ngo, Xinmei Tian, Tao Mei","doi":"10.1109/CVPR.2019.01233","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01233","url":null,"abstract":"Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for visual recognition problems. Nevertheless, the convolutional filters in these networks are local operations while ignoring the large-range dependency. Such drawback becomes even worse particularly for video recognition, since video is an information-intensive media with complex temporal variations. In this paper, we present a novel framework to boost the spatio-temporal representation learning by Local and Global Diffusion (LGD). Specifically, we construct a novel neural network architecture that learns the local and global representations in parallel. The architecture is composed of LGD blocks, where each block updates local and global features by modeling the diffusions between these two representations. Diffusions effectively interact two aspects of information, i.e., localized and holistic, for more powerful way of representation learning. Furthermore, a kernelized classifier is introduced to combine the representations from two aspects for video recognition. Our LGD networks achieve clear improvements on the large-scale Kinetics-400 and Kinetics-600 video classification datasets against the best competitors by 3.5% and 0.7%. We further examine the generalization of both the global and local representations produced by our pre-trained LGD networks on four different benchmarks for video action recognition and spatio-temporal action detection tasks. Superior performances over several state-of-the-art techniques on these benchmarks are reported.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"66 1","pages":"12048-12057"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77967732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 153

Collaborative Learning of Semi-Supervised Segmentation and Classification for Medical Images 医学图像半监督分割与分类的协同学习

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00218

Yi Zhou, Xiaodong He, Lei Huang, Li Liu, Fan Zhu, Shanshan Cui, Ling Shao

Medical image analysis has two important research areas: disease grading and fine-grained lesion segmentation. Although the former problem often relies on the latter, the two are usually studied separately. Disease severity grading can be treated as a classification problem, which only requires image-level annotations, while the lesion segmentation requires stronger pixel-level annotations. However, pixel-wise data annotation for medical images is highly time-consuming and requires domain experts. In this paper, we propose a collaborative learning method to jointly improve the performance of disease grading and lesion segmentation by semi-supervised learning with an attention mechanism. Given a small set of pixel-level annotated data, a multi-lesion mask generation model first performs the traditional semantic segmentation task. Then, based on initially predicted lesion maps for large quantities of image-level annotated data, a lesion attentive disease grading model is designed to improve the severity classification accuracy. Meanwhile, the lesion attention model can refine the lesion maps using class-specific information to fine-tune the segmentation model in a semi-supervised manner. An adversarial architecture is also integrated for training. With extensive experiments on a representative medical problem called diabetic retinopathy (DR), we validate the effectiveness of our method and achieve consistent improvements over state-of-the-art methods on three public datasets.

医学图像分析有两个重要的研究领域:疾病分级和细粒度病灶分割。虽然前一个问题往往依赖于后一个问题，但这两个问题通常是分开研究的。疾病严重程度分级可以看作是一个分类问题，只需要图像级的标注，而病变分割则需要更强的像素级标注。然而，医学图像的逐像素数据标注非常耗时，并且需要领域专家。在本文中，我们提出了一种协作学习方法，通过半监督学习，结合注意机制，共同提高疾病分级和病灶分割的性能。给定一小组像素级标注数据，多损伤掩码生成模型首先执行传统的语义分割任务。然后，基于对大量图像级标注数据的初步预测病变图，设计病变关注疾病分级模型，提高严重程度分类精度。同时，病灶注意模型可以利用特定类别的信息对病灶图进行细化，以半监督的方式对分割模型进行微调。对抗性架构也被集成到训练中。通过对代表性医学问题糖尿病视网膜病变(DR)的广泛实验，我们验证了我们方法的有效性，并在三个公共数据集上实现了对最先进方法的一致改进。

{"title":"Collaborative Learning of Semi-Supervised Segmentation and Classification for Medical Images","authors":"Yi Zhou, Xiaodong He, Lei Huang, Li Liu, Fan Zhu, Shanshan Cui, Ling Shao","doi":"10.1109/CVPR.2019.00218","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00218","url":null,"abstract":"Medical image analysis has two important research areas: disease grading and fine-grained lesion segmentation. Although the former problem often relies on the latter, the two are usually studied separately. Disease severity grading can be treated as a classification problem, which only requires image-level annotations, while the lesion segmentation requires stronger pixel-level annotations. However, pixel-wise data annotation for medical images is highly time-consuming and requires domain experts. In this paper, we propose a collaborative learning method to jointly improve the performance of disease grading and lesion segmentation by semi-supervised learning with an attention mechanism. Given a small set of pixel-level annotated data, a multi-lesion mask generation model first performs the traditional semantic segmentation task. Then, based on initially predicted lesion maps for large quantities of image-level annotated data, a lesion attentive disease grading model is designed to improve the severity classification accuracy. Meanwhile, the lesion attention model can refine the lesion maps using class-specific information to fine-tune the segmentation model in a semi-supervised manner. An adversarial architecture is also integrated for training. With extensive experiments on a representative medical problem called diabetic retinopathy (DR), we validate the effectiveness of our method and achieve consistent improvements over state-of-the-art methods on three public datasets.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"14 1","pages":"2074-2083"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74890530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 163

Coordinate-Free Carlsson-Weinshall Duality and Relative Multi-View Geometry 无坐标Carlsson-Weinshall对偶和相对多视图几何

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00031

Matthew Trager, M. Hebert, J. Ponce

We present a coordinate-free description of Carlsson-Weinshall duality between scene points and camera pinholes and use it to derive a new characterization of primal/dual multi-view geometry. In the case of three views, a particular set of reduced trilinearities provide a novel parameterization of camera geometry that, unlike existing ones, is subject only to very simple internal constraints. These trilinearities lead to new "quasi-linear" algorithms for primal and dual structure from motion. We include some preliminary experiments with real and synthetic data.

我们提出了场景点和相机针孔之间的Carlsson-Weinshall对偶性的无坐标描述，并利用它推导了原始/对偶多视图几何的新表征。在三个视图的情况下，一组特定的简化三线性提供了一种新的相机几何参数化，与现有的不同，它只受非常简单的内部约束。这些三线性导致了新的“准线性”算法从运动的原始和对偶结构。我们包含了一些真实和合成数据的初步实验。

引用次数: 3

Attentive Region Embedding Network for Zero-Shot Learning 零射击学习的关注区域嵌入网络

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00961

Guosen Xie, Li Liu, Xiaobo Jin, Fan Zhu, Zheng Zhang, Jie Qin, Yazhou Yao, Ling Shao

Zero-shot learning (ZSL) aims to classify images from unseen categories, by merely utilizing seen class images as the training data. Existing works on ZSL mainly leverage the global features or learn the global regions, from which, to construct the embeddings to the semantic space. However, few of them study the discrimination power implied in local image regions (parts), which, in some sense, correspond to semantic attributes, have stronger discrimination than attributes, and can thus assist the semantic transfer between seen/unseen classes. In this paper, to discover (semantic) regions, we propose the attentive region embedding network (AREN), which is tailored to advance the ZSL task. Specifically, AREN is end-to-end trainable and consists of two network branches, i.e., the attentive region embedding (ARE) stream, and the attentive compressed second-order embedding (ACSE) stream. ARE is capable of discovering multiple part regions under the guidance of the attention and the compatibility loss. Moreover, a novel adaptive thresholding mechanism is proposed for suppressing redundant (such as background) attention regions. To further guarantee more stable semantic transfer from the perspective of second-order collaboration, ACSE is incorporated into the AREN. In the comprehensive evaluations on four benchmarks, our models achieve state-of-the-art performances under ZSL setting, and compelling results under generalized ZSL setting.

零射击学习(Zero-shot learning, ZSL)的目的是将未见过的类别图像作为训练数据，从未见过的类别中对图像进行分类。现有的ZSL研究主要是利用全局特征或学习全局区域，以此构建对语义空间的嵌入。然而，很少有研究图像局部区域(部分)隐含的辨别能力，这些区域在某种意义上对应于语义属性，比属性具有更强的辨别能力，从而有助于可见/未见类之间的语义传递。在本文中，为了发现(语义)区域，我们提出了专为推进ZSL任务而定制的关注区域嵌入网络(AREN)。具体来说，AREN是端到端可训练的，由两个网络分支组成，即关注区域嵌入(ARE)流和关注压缩二阶嵌入(ACSE)流。在注意力和兼容性损失的引导下，ARE能够发现多个部分区域。此外，提出了一种新的自适应阈值机制来抑制冗余(如背景)注意区域。为了从二阶协作的角度进一步保证更稳定的语义传递，我们将ACSE引入到AREN中。在四个基准的综合评估中，我们的模型在ZSL设置下达到了最先进的性能，在广义ZSL设置下取得了令人信服的结果。

{"title":"Attentive Region Embedding Network for Zero-Shot Learning","authors":"Guosen Xie, Li Liu, Xiaobo Jin, Fan Zhu, Zheng Zhang, Jie Qin, Yazhou Yao, Ling Shao","doi":"10.1109/CVPR.2019.00961","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00961","url":null,"abstract":"Zero-shot learning (ZSL) aims to classify images from unseen categories, by merely utilizing seen class images as the training data. Existing works on ZSL mainly leverage the global features or learn the global regions, from which, to construct the embeddings to the semantic space. However, few of them study the discrimination power implied in local image regions (parts), which, in some sense, correspond to semantic attributes, have stronger discrimination than attributes, and can thus assist the semantic transfer between seen/unseen classes. In this paper, to discover (semantic) regions, we propose the attentive region embedding network (AREN), which is tailored to advance the ZSL task. Specifically, AREN is end-to-end trainable and consists of two network branches, i.e., the attentive region embedding (ARE) stream, and the attentive compressed second-order embedding (ACSE) stream. ARE is capable of discovering multiple part regions under the guidance of the attention and the compatibility loss. Moreover, a novel adaptive thresholding mechanism is proposed for suppressing redundant (such as background) attention regions. To further guarantee more stable semantic transfer from the perspective of second-order collaboration, ACSE is incorporated into the AREN. In the comprehensive evaluations on four benchmarks, our models achieve state-of-the-art performances under ZSL setting, and compelling results under generalized ZSL setting.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"27 1","pages":"9376-9385"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75735818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 205

Cascaded Generative and Discriminative Learning for Microcalcification Detection in Breast Mammograms 级联生成和判别学习在乳房x光检查中的微钙化检测

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01286

Fandong Zhang, Ling Luo, Xinwei Sun, Zhen Zhou, Xiuli Li, Yizhou Yu, Yizhou Wang

Accurate microcalcification (μC) detection is of great importance due to its high proportion in early breast cancers. Most of the previous μC detection methods belong to discriminative models, where classifiers are exploited to distinguish μCs from other backgrounds. However, it is still challenging for these methods to tell the μCs from amounts of normal tissues because they are too tiny (at most 14 pixels). Generative methods can precisely model the normal tissues and regard the abnormal ones as outliers, while they fail to further distinguish the μCs from other anomalies, i.e. vessel calcifications. In this paper, we propose a hybrid approach by taking advantages of both generative and discriminative models. Firstly, a generative model named Anomaly Separation Network (ASN) is used to generate candidate μCs. ASN contains two major components. A deep convolutional encoder-decoder network is built to learn the image reconstruction mapping and a t-test loss function is designed to separate the distributions of the reconstruction residuals of μCs from normal tissues. Secondly, a discriminative model is cascaded to tell the μCs from the false positives. Finally, to verify the effectiveness of our method, we conduct experiments on both public and in-house datasets, which demonstrates that our approach outperforms previous state-of-the-art methods.

由于微钙化(μC)在早期乳腺癌中所占比例较高，因此准确检测微钙化(μC)具有重要意义。以前的μC检测方法大多属于判别模型，利用分类器将μC与其他背景区分开来。然而，对于这些方法来说，将μ c与大量正常组织区分开来仍然具有挑战性，因为它们太小(最多14个像素)。生成方法可以精确地模拟正常组织，并将异常组织视为异常值，但无法进一步区分μ c与其他异常(如血管钙化)。在本文中，我们提出了一种利用生成模型和判别模型的混合方法。首先，利用异常分离网络(ASN)生成模型生成候选μ c;ASN包含两个主要组成部分。构建深度卷积编码器网络学习图像重建映射，设计t检验损失函数分离μ c与正常组织的重建残差分布。其次，通过级联判别模型来区分μ c和假阳性。最后，为了验证我们方法的有效性，我们在公共和内部数据集上进行了实验，这表明我们的方法优于以前最先进的方法。

{"title":"Cascaded Generative and Discriminative Learning for Microcalcification Detection in Breast Mammograms","authors":"Fandong Zhang, Ling Luo, Xinwei Sun, Zhen Zhou, Xiuli Li, Yizhou Yu, Yizhou Wang","doi":"10.1109/CVPR.2019.01286","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01286","url":null,"abstract":"Accurate microcalcification (μC) detection is of great importance due to its high proportion in early breast cancers. Most of the previous μC detection methods belong to discriminative models, where classifiers are exploited to distinguish μCs from other backgrounds. However, it is still challenging for these methods to tell the μCs from amounts of normal tissues because they are too tiny (at most 14 pixels). Generative methods can precisely model the normal tissues and regard the abnormal ones as outliers, while they fail to further distinguish the μCs from other anomalies, i.e. vessel calcifications. In this paper, we propose a hybrid approach by taking advantages of both generative and discriminative models. Firstly, a generative model named Anomaly Separation Network (ASN) is used to generate candidate μCs. ASN contains two major components. A deep convolutional encoder-decoder network is built to learn the image reconstruction mapping and a t-test loss function is designed to separate the distributions of the reconstruction residuals of μCs from normal tissues. Secondly, a discriminative model is cascaded to tell the μCs from the false positives. Finally, to verify the effectiveness of our method, we conduct experiments on both public and in-house datasets, which demonstrates that our approach outperforms previous state-of-the-art methods.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"2018 1","pages":"12570-12578"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74288132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Progressive Teacher-Student Learning for Early Action Prediction 递进式师生学习促进早期行动预测

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00367

Xionghui Wang, Jianfang Hu, J. Lai, Jianguo Zhang, Weishi Zheng

The goal of early action prediction is to recognize actions from partially observed videos with incomplete action executions, which is quite different from action recognition. Predicting early actions is very challenging since the partially observed videos do not contain enough action information for recognition. In this paper, we aim at improving early action prediction by proposing a novel teacher-student learning framework. Our framework involves a teacher model for recognizing actions from full videos, a student model for predicting early actions from partial videos, and a teacher-student learning block for distilling progressive knowledge from teacher to student, crossing different tasks. Extensive experiments on three public action datasets show that the proposed progressive teacher-student learning framework can consistently improve performance of early action prediction model. We have also reported the state-of-the-art performances for early action prediction on all of these sets.

早期动作预测的目标是从未完成动作执行的部分观察视频中识别动作，这与动作识别有很大的不同。预测早期动作是非常具有挑战性的，因为部分观察到的视频不包含足够的动作信息来识别。在本文中，我们旨在通过提出一个新的师生学习框架来提高早期行动预测。我们的框架包括一个教师模型，用于从完整视频中识别动作，一个学生模型，用于从部分视频中预测早期动作，以及一个师生学习块，用于从教师到学生中提取渐进式知识，跨越不同的任务。在三个公共行动数据集上的大量实验表明，所提出的渐进式师生学习框架能够持续提高早期行动预测模型的性能。我们还报道了在所有这些集合上进行早期动作预测的最先进的性能。

引用次数: 98

Inverse Discriminative Networks for Handwritten Signature Verification 手写体签名验证的逆判别网络

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00591

Ping Wei, Huan Li, Ping Hu

Handwritten signature verification is an important technique for many financial, commercial, and forensic applications. In this paper, we propose an inverse discriminative network (IDN) for writer-independent handwritten signature verification, which aims to determine whether a test signature is genuine or forged compared to the reference signature. The IDN model contains four weight-shared neural network streams, of which two receiving the original signature images are the discriminative streams and the other two addressing the gray-inverted images form the inverse streams. Multiple paths of attention modules connect the discriminative streams and the inverse streams to propagate messages. With the inverse streams and the multi-path attention modules, the IDN model intensifies the effective information of signature verification. Since there was no proper Chinese signature dataset in the community, we collected a large-scale Chinese signature dataset with approximately 29,000 images of 749 individuals’ signatures. We test our method on the Chinese signature dataset and other three signature datasets of different languages: CEDAR, BHSig-B, and BHSig-H. Experiments prove the strength and potential of our method.

手写签名验证是许多金融、商业和法医学应用的重要技术。本文提出了一种用于手写签名验证的反判别网络(IDN)，该网络旨在确定测试签名与参考签名相比是真实的还是伪造的。IDN模型包含4个权值共享的神经网络流，其中接收原始签名图像的2个为判别流，处理灰度反转图像的2个为逆流。注意模块的多条路径连接判别流和逆流来传播消息。IDN模型通过引入反向流和多路径关注模块，增强了签名验证的有效信息。由于社区中没有合适的中文签名数据集，我们收集了一个包含约29,000张749个人签名图像的大规模中文签名数据集。我们在中文签名数据集和其他三种不同语言的签名数据集(CEDAR、BHSig-B和BHSig-H)上测试了我们的方法。实验证明了该方法的有效性和潜力。

{"title":"Inverse Discriminative Networks for Handwritten Signature Verification","authors":"Ping Wei, Huan Li, Ping Hu","doi":"10.1109/CVPR.2019.00591","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00591","url":null,"abstract":"Handwritten signature verification is an important technique for many financial, commercial, and forensic applications. In this paper, we propose an inverse discriminative network (IDN) for writer-independent handwritten signature verification, which aims to determine whether a test signature is genuine or forged compared to the reference signature. The IDN model contains four weight-shared neural network streams, of which two receiving the original signature images are the discriminative streams and the other two addressing the gray-inverted images form the inverse streams. Multiple paths of attention modules connect the discriminative streams and the inverse streams to propagate messages. With the inverse streams and the multi-path attention modules, the IDN model intensifies the effective information of signature verification. Since there was no proper Chinese signature dataset in the community, we collected a large-scale Chinese signature dataset with approximately 29,000 images of 749 individuals’ signatures. We test our method on the Chinese signature dataset and other three signature datasets of different languages: CEDAR, BHSig-B, and BHSig-H. Experiments prove the strength and potential of our method.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"46 1","pages":"5757-5765"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72791460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31