首页 > 最新文献

2019 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

英文 中文
Attentional Neural Fields for Crowd Counting 人群计数的注意神经场
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00581
Anran Zhang, Lei Yue, Jiayi Shen, Fan Zhu, Xiantong Zhen, Xianbin Cao, Ling Shao
Crowd counting has recently generated huge popularity in computer vision, and is extremely challenging due to the huge scale variations of objects. In this paper, we propose the Attentional Neural Field (ANF) for crowd counting via density estimation. Within the encoder-decoder network, we introduce conditional random fields (CRFs) to aggregate multi-scale features, which can build more informative representations. To better model pair-wise potentials in CRFs, we incorperate non-local attention mechanism implemented as inter- and intra-layer attentions to expand the receptive field to the entire image respectively within the same layer and across different layers, which captures long-range dependencies to conquer huge scale variations. The CRFs coupled with the attention mechanism are seamlessly integrated into the encoder-decoder network, establishing an ANF that can be optimized end-to-end by back propagation. We conduct extensive experiments on four public datasets, including ShanghaiTech, WorldEXPO 10, UCF-CC-50 and UCF-QNRF. The results show that our ANF achieves high counting performance, surpassing most previous methods.
人群计数最近在计算机视觉中产生了巨大的普及,并且由于物体的巨大规模变化而极具挑战性。在本文中,我们提出了通过密度估计进行人群计数的注意神经场(attention Neural Field, ANF)。在编码器-解码器网络中,我们引入条件随机场(CRFs)来聚合多尺度特征,从而可以构建更多的信息表示。为了更好地模拟CRFs中的成对电位,我们将非局部注意机制作为层间和层内注意实现,将感受野分别扩展到同一层内和不同层之间的整个图像,从而捕获远程依赖关系以克服巨大的尺度变化。将crf与注意机制无缝集成到编码器-解码器网络中,建立一个可以通过反向传播进行端到端优化的ANF。我们在ShanghaiTech、world dexpo 10、UCF-CC-50和UCF-QNRF四个公共数据集上进行了广泛的实验。结果表明,我们的ANF达到了很高的计数性能,超过了大多数以前的方法。
{"title":"Attentional Neural Fields for Crowd Counting","authors":"Anran Zhang, Lei Yue, Jiayi Shen, Fan Zhu, Xiantong Zhen, Xianbin Cao, Ling Shao","doi":"10.1109/ICCV.2019.00581","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00581","url":null,"abstract":"Crowd counting has recently generated huge popularity in computer vision, and is extremely challenging due to the huge scale variations of objects. In this paper, we propose the Attentional Neural Field (ANF) for crowd counting via density estimation. Within the encoder-decoder network, we introduce conditional random fields (CRFs) to aggregate multi-scale features, which can build more informative representations. To better model pair-wise potentials in CRFs, we incorperate non-local attention mechanism implemented as inter- and intra-layer attentions to expand the receptive field to the entire image respectively within the same layer and across different layers, which captures long-range dependencies to conquer huge scale variations. The CRFs coupled with the attention mechanism are seamlessly integrated into the encoder-decoder network, establishing an ANF that can be optimized end-to-end by back propagation. We conduct extensive experiments on four public datasets, including ShanghaiTech, WorldEXPO 10, UCF-CC-50 and UCF-QNRF. The results show that our ANF achieves high counting performance, surpassing most previous methods.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"15 1","pages":"5713-5722"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73993702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 98
Enriched Feature Guided Refinement Network for Object Detection 增强特征导向的目标检测细化网络
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00963
Jing Nie, R. Anwer, Hisham Cholakkal, F. Khan, Yanwei Pang, Ling Shao
We propose a single-stage detection framework that jointly tackles the problem of multi-scale object detection and class imbalance. Rather than designing deeper networks, we introduce a simple yet effective feature enrichment scheme to produce multi-scale contextual features. We further introduce a cascaded refinement scheme which first instills multi-scale contextual features into the prediction layers of the single-stage detector in order to enrich their discriminative power for multi-scale detection. Second, the cascaded refinement scheme counters the class imbalance problem by refining the anchors and enriched features to improve classification and regression. Experiments are performed on two benchmarks: PASCAL VOC and MS COCO. For a 320×320 input on the MS COCO test-dev, our detector achieves state-of-the-art single-stage detection accuracy with a COCO AP of 33.2 in the case of single-scale inference, while operating at 21 milliseconds on a Titan XP GPU. For a 512×512 input on the MS COCO test-dev, our approach obtains an absolute gain of 1.6% in terms of COCO AP, compared to the best reported single-stage results[5]. Source code and models are available at: https://github.com/Ranchentx/EFGRNet.
我们提出了一个单阶段检测框架,共同解决了多尺度目标检测和类不平衡问题。我们不是设计更深层次的网络,而是引入一种简单而有效的特征丰富方案来产生多尺度上下文特征。我们进一步引入了一种级联改进方案,该方案首先将多尺度上下文特征注入单级检测器的预测层,以增强其对多尺度检测的判别能力。其次,级联细化方案通过细化锚点和丰富特征来解决类不平衡问题,从而提高分类和回归能力。在PASCAL VOC和MS COCO两个基准上进行了实验。对于MS COCO测试开发的320×320输入,我们的检测器在单尺度推理的情况下实现了最先进的单级检测精度,COCO AP为33.2,而在Titan XP GPU上运行为21毫秒。对于MS COCO测试开发的512×512输入,与报告的最佳单阶段结果[5]相比,我们的方法在COCO AP方面获得了1.6%的绝对增益。源代码和模型可在:https://github.com/Ranchentx/EFGRNet。
{"title":"Enriched Feature Guided Refinement Network for Object Detection","authors":"Jing Nie, R. Anwer, Hisham Cholakkal, F. Khan, Yanwei Pang, Ling Shao","doi":"10.1109/ICCV.2019.00963","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00963","url":null,"abstract":"We propose a single-stage detection framework that jointly tackles the problem of multi-scale object detection and class imbalance. Rather than designing deeper networks, we introduce a simple yet effective feature enrichment scheme to produce multi-scale contextual features. We further introduce a cascaded refinement scheme which first instills multi-scale contextual features into the prediction layers of the single-stage detector in order to enrich their discriminative power for multi-scale detection. Second, the cascaded refinement scheme counters the class imbalance problem by refining the anchors and enriched features to improve classification and regression. Experiments are performed on two benchmarks: PASCAL VOC and MS COCO. For a 320×320 input on the MS COCO test-dev, our detector achieves state-of-the-art single-stage detection accuracy with a COCO AP of 33.2 in the case of single-scale inference, while operating at 21 milliseconds on a Titan XP GPU. For a 512×512 input on the MS COCO test-dev, our approach obtains an absolute gain of 1.6% in terms of COCO AP, compared to the best reported single-stage results[5]. Source code and models are available at: https://github.com/Ranchentx/EFGRNet.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"10 1","pages":"9536-9545"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75272716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
Context-Aware Feature and Label Fusion for Facial Action Unit Intensity Estimation With Partially Labeled Data 基于上下文感知特征和标签融合的部分标记数据面部动作单元强度估计
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00082
Yong Zhang, Haiyong Jiang, Baoyuan Wu, Yanbo Fan, Q. Ji
Facial action unit (AU) intensity estimation is a fundamental task for facial behaviour analysis. Most previous methods use a whole face image as input for intensity prediction. Considering that AUs are defined according to their corresponding local appearance, a few patch-based methods utilize image features of local patches. However, fusion of local features is always performed via straightforward feature concatenation or summation. Besides, these methods require fully annotated databases for model learning, which is expensive to acquire. In this paper, we propose a novel weakly supervised patch-based deep model on basis of two types of attention mechanisms for joint intensity estimation of multiple AUs. The model consists of a feature fusion module and a label fusion module. And we augment attention mechanisms of these two modules with a learnable task-related context, as one patch may play different roles in analyzing different AUs and each AU has its own temporal evolution rule. The context-aware feature fusion module is used to capture spatial relationships among local patches while the context-aware label fusion module is used to capture the temporal dynamics of AUs. The latter enables the model to be trained on a partially annotated database. Experimental evaluations on two benchmark expression databases demonstrate the superior performance of the proposed method.
面部动作单元(AU)强度估计是面部行为分析的基本任务。以前的方法大多使用整张人脸图像作为输入进行强度预测。考虑到AUs是根据其对应的局部外观来定义的,一些基于patch的方法利用了局部patch的图像特征。然而,局部特征的融合通常是通过直接的特征拼接或求和来实现的。此外,这些方法需要完全注释的数据库来进行模型学习,这是昂贵的。在本文中,我们提出了一种基于两种注意机制的基于弱监督patch的深度模型,用于多个AUs的联合强度估计。该模型由特征融合模块和标签融合模块组成。由于一个patch在分析不同的AU时可能扮演不同的角色,并且每个AU都有自己的时间演化规律,因此我们在可学习的任务相关上下文中增强了这两个模块的注意机制。上下文感知特征融合模块用于捕获局部斑块之间的空间关系,上下文感知标签融合模块用于捕获AUs的时间动态。后者使模型能够在部分注释的数据库上进行训练。在两个基准表达式数据库上的实验评估表明了该方法的优越性能。
{"title":"Context-Aware Feature and Label Fusion for Facial Action Unit Intensity Estimation With Partially Labeled Data","authors":"Yong Zhang, Haiyong Jiang, Baoyuan Wu, Yanbo Fan, Q. Ji","doi":"10.1109/ICCV.2019.00082","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00082","url":null,"abstract":"Facial action unit (AU) intensity estimation is a fundamental task for facial behaviour analysis. Most previous methods use a whole face image as input for intensity prediction. Considering that AUs are defined according to their corresponding local appearance, a few patch-based methods utilize image features of local patches. However, fusion of local features is always performed via straightforward feature concatenation or summation. Besides, these methods require fully annotated databases for model learning, which is expensive to acquire. In this paper, we propose a novel weakly supervised patch-based deep model on basis of two types of attention mechanisms for joint intensity estimation of multiple AUs. The model consists of a feature fusion module and a label fusion module. And we augment attention mechanisms of these two modules with a learnable task-related context, as one patch may play different roles in analyzing different AUs and each AU has its own temporal evolution rule. The context-aware feature fusion module is used to capture spatial relationships among local patches while the context-aware label fusion module is used to capture the temporal dynamics of AUs. The latter enables the model to be trained on a partially annotated database. Experimental evaluations on two benchmark expression databases demonstrate the superior performance of the proposed method.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"6 1","pages":"733-742"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74712648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Dynamic PET Image Reconstruction Using Nonnegative Matrix Factorization Incorporated With Deep Image Prior 基于深度图像先验的非负矩阵分解方法的PET图像动态重构
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00322
Tatsuya Yokota, Kazuya Kawai, M. Sakata, Y. Kimura, H. Hontani
We propose a method that reconstructs dynamic positron emission tomography (PET) images from given sinograms by using non-negative matrix factorization (NMF) incorporated with a deep image prior (DIP) for appropriately constraining the spatial patterns of resultant images. The proposed method can reconstruct dynamic PET images with higher signal-to-noise ratio (SNR) and blindly decompose an image matrix into pairs of spatial and temporal factors. The former represent homogeneous tissues with different kinetic parameters and the latter represent the time activity curves that are observed in the corresponding homogeneous tissues. We employ U-Nets combined in parallel for DIP and each of the U-nets is used to extract each spatial factor decomposed from the data matrix. Experimental results show that the proposed method outperforms conventional methods and can extract spatial factors that represent the homogeneous tissues.
本文提出了一种利用非负矩阵分解(NMF)和深度图像先验(DIP)对合成图像的空间模式进行适当约束的方法,从给定的正弦图中重建动态正电子发射断层扫描(PET)图像。该方法可以重建具有较高信噪比的动态PET图像,并将图像矩阵盲目分解为时空因子对。前者表示具有不同动力学参数的均质组织,后者表示在相应的均质组织中观察到的时间活性曲线。我们采用并行组合的U-Nets进行DIP,每个U-Nets用于提取从数据矩阵中分解的每个空间因子。实验结果表明,该方法能较好地提取代表均匀组织的空间因子。
{"title":"Dynamic PET Image Reconstruction Using Nonnegative Matrix Factorization Incorporated With Deep Image Prior","authors":"Tatsuya Yokota, Kazuya Kawai, M. Sakata, Y. Kimura, H. Hontani","doi":"10.1109/ICCV.2019.00322","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00322","url":null,"abstract":"We propose a method that reconstructs dynamic positron emission tomography (PET) images from given sinograms by using non-negative matrix factorization (NMF) incorporated with a deep image prior (DIP) for appropriately constraining the spatial patterns of resultant images. The proposed method can reconstruct dynamic PET images with higher signal-to-noise ratio (SNR) and blindly decompose an image matrix into pairs of spatial and temporal factors. The former represent homogeneous tissues with different kinetic parameters and the latter represent the time activity curves that are observed in the corresponding homogeneous tissues. We employ U-Nets combined in parallel for DIP and each of the U-nets is used to extract each spatial factor decomposed from the data matrix. Experimental results show that the proposed method outperforms conventional methods and can extract spatial factors that represent the homogeneous tissues.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"31 1","pages":"3126-3135"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73203630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Face Video Deblurring Using 3D Facial Priors 面部视频去模糊使用3D面部先验
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00948
Wenqi Ren, Jiaolong Yang, Senyou Deng, D. Wipf, Xiaochun Cao, Xin Tong
Existing face deblurring methods only consider single frames and do not account for facial structure and identity information. These methods struggle to deblur face videos that exhibit significant pose variations and misalignment. In this paper we propose a novel face video deblurring network capitalizing on 3D facial priors. The model consists of two main branches: i) a face video deblurring sub-network based on an encoder-decoder architecture, and ii) a 3D face reconstruction and rendering branch for predicting 3D priors of salient facial structures and identity knowledge. These structures encourage the deblurring branch to generate sharp faces with detailed structures. Our method not only uses low-level information (i.e., image intensity), but also middle-level information (i.e., 3D facial structure) and high-level knowledge (i.e., identity content) to further explore spatial constraints of facial components from blurry face frames. Extensive experimental results demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.
现有的人脸去模糊方法只考虑单帧图像,没有考虑到人脸结构和身份信息。这些方法很难去模糊那些表现出明显姿势变化和不对齐的面部视频。本文提出了一种基于三维人脸先验的人脸视频去模糊网络。该模型包括两个主要分支:i)基于编码器-解码器架构的人脸视频去模糊子网络;ii)用于预测显著面部结构和身份知识的3D先验的3D人脸重建和渲染分支。这些结构鼓励去模糊分支生成具有详细结构的尖锐面。我们的方法不仅利用低级信息(即图像强度),还利用中级信息(即三维面部结构)和高级知识(即身份内容),从模糊的人脸框架中进一步探索面部成分的空间约束。大量的实验结果表明,所提出的算法优于最先进的方法。
{"title":"Face Video Deblurring Using 3D Facial Priors","authors":"Wenqi Ren, Jiaolong Yang, Senyou Deng, D. Wipf, Xiaochun Cao, Xin Tong","doi":"10.1109/ICCV.2019.00948","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00948","url":null,"abstract":"Existing face deblurring methods only consider single frames and do not account for facial structure and identity information. These methods struggle to deblur face videos that exhibit significant pose variations and misalignment. In this paper we propose a novel face video deblurring network capitalizing on 3D facial priors. The model consists of two main branches: i) a face video deblurring sub-network based on an encoder-decoder architecture, and ii) a 3D face reconstruction and rendering branch for predicting 3D priors of salient facial structures and identity knowledge. These structures encourage the deblurring branch to generate sharp faces with detailed structures. Our method not only uses low-level information (i.e., image intensity), but also middle-level information (i.e., 3D facial structure) and high-level knowledge (i.e., identity content) to further explore spatial constraints of facial components from blurry face frames. Extensive experimental results demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"30 1","pages":"9387-9396"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73214623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Distillation-Based Training for Multi-Exit Architectures 多出口架构的基于蒸馏的培训
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00144
Mary Phuong, Christoph H. Lampert
Multi-exit architectures, in which a stack of processing layers is interleaved with early output layers, allow the processing of a test example to stop early and thus save computation time and/or energy. In this work, we propose a new training procedure for multi-exit architectures based on the principle of knowledge distillation. The method encourages early exits to mimic later, more accurate exits, by matching their probability outputs. Experiments on CIFAR100 and ImageNet show that distillation-based training significantly improves the accuracy of early exits while maintaining state-of-the-art accuracy for late ones. The method is particularly beneficial when training data is limited and also allows a straight-forward extension to semi-supervised learning, i.e. make use also of unlabeled data at training time. Moreover, it takes only a few lines to implement and imposes almost no computational overhead at training time, and none at all at test time.
在多出口架构中,处理层堆栈与早期输出层交织在一起,允许测试示例的处理提前停止,从而节省计算时间和/或能量。在这项工作中,我们提出了一种新的基于知识蒸馏原理的多出口架构训练过程。该方法通过匹配早期出口的概率输出,鼓励早期出口模仿后期更准确的出口。在CIFAR100和ImageNet上的实验表明,基于蒸馏的训练显著提高了早期出口的准确性,同时对后期出口保持了最先进的准确性。当训练数据有限时,该方法特别有用,并且还允许直接扩展到半监督学习,即在训练时也使用未标记的数据。此外,它只需要几行代码就可以实现,并且在训练时几乎没有计算开销,在测试时则完全没有。
{"title":"Distillation-Based Training for Multi-Exit Architectures","authors":"Mary Phuong, Christoph H. Lampert","doi":"10.1109/ICCV.2019.00144","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00144","url":null,"abstract":"Multi-exit architectures, in which a stack of processing layers is interleaved with early output layers, allow the processing of a test example to stop early and thus save computation time and/or energy. In this work, we propose a new training procedure for multi-exit architectures based on the principle of knowledge distillation. The method encourages early exits to mimic later, more accurate exits, by matching their probability outputs. Experiments on CIFAR100 and ImageNet show that distillation-based training significantly improves the accuracy of early exits while maintaining state-of-the-art accuracy for late ones. The method is particularly beneficial when training data is limited and also allows a straight-forward extension to semi-supervised learning, i.e. make use also of unlabeled data at training time. Moreover, it takes only a few lines to implement and imposes almost no computational overhead at training time, and none at all at test time.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"1355-1364"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75313299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 120
Multi-Class Part Parsing With Joint Boundary-Semantic Awareness 基于联合边界语义感知的多类零件解析
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00927
Yifan Zhao, Jia Li, Yu Zhang, Yonghong Tian
Object part parsing in the wild, which requires to simultaneously detect multiple object classes in the scene and accurately segments semantic parts within each class, is challenging for the joint presence of class-level and part-level ambiguities. Despite its importance, however, this problem is not sufficiently explored in existing works. In this paper, we propose a joint parsing framework with boundary and semantic awareness to address this challenging problem. To handle part-level ambiguity, a boundary awareness module is proposed to make mid-level features at multiple scales attend to part boundaries for accurate part localization, which are then fused with high-level features for effective part recognition. For class-level ambiguity, we further present a semantic awareness module that selects discriminative part features relevant to a category to prevent irrelevant features being merged together. The proposed modules are lightweight and implementation friendly, improving the performance substantially when plugged into various baseline architectures. Without bells and whistles, the full model sets new state-of-the-art results on the Pascal-Part dataset, in both multi-class and the conventional single-class setting, while running substantially faster than recent high-performance approaches.
野外对象部分解析需要同时检测场景中的多个对象类,并在每个类中准确地分割语义部分,这对类级和部分级歧义的共同存在具有挑战性。然而,尽管这一问题很重要,但在现有的工作中并没有得到充分的探讨。在本文中,我们提出了一个具有边界和语义感知的联合解析框架来解决这一具有挑战性的问题。为了解决零件级模糊问题,提出了一种边界感知模块,使多尺度的中级特征关注零件边界,实现精确的零件定位,然后将其与高级特征融合,实现有效的零件识别。对于类级歧义,我们进一步提出了语义感知模块,该模块选择与类别相关的判别部分特征,以防止不相关的特征合并在一起。所建议的模块是轻量级的和实现友好的,当插入到各种基线体系结构中时,可以大大提高性能。完整的模型在Pascal-Part数据集上设置了最新的最先进的结果,在多类和传统的单类设置中,同时运行速度比最近的高性能方法快得多。
{"title":"Multi-Class Part Parsing With Joint Boundary-Semantic Awareness","authors":"Yifan Zhao, Jia Li, Yu Zhang, Yonghong Tian","doi":"10.1109/ICCV.2019.00927","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00927","url":null,"abstract":"Object part parsing in the wild, which requires to simultaneously detect multiple object classes in the scene and accurately segments semantic parts within each class, is challenging for the joint presence of class-level and part-level ambiguities. Despite its importance, however, this problem is not sufficiently explored in existing works. In this paper, we propose a joint parsing framework with boundary and semantic awareness to address this challenging problem. To handle part-level ambiguity, a boundary awareness module is proposed to make mid-level features at multiple scales attend to part boundaries for accurate part localization, which are then fused with high-level features for effective part recognition. For class-level ambiguity, we further present a semantic awareness module that selects discriminative part features relevant to a category to prevent irrelevant features being merged together. The proposed modules are lightweight and implementation friendly, improving the performance substantially when plugged into various baseline architectures. Without bells and whistles, the full model sets new state-of-the-art results on the Pascal-Part dataset, in both multi-class and the conventional single-class setting, while running substantially faster than recent high-performance approaches.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"2 1","pages":"9176-9185"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75479109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Elaborate Monocular Point and Line SLAM With Robust Initialization 基于鲁棒初始化的精细单目点线SLAM
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00121
Sang Jun Lee, S. Hwang
This paper presents a monocular indirect SLAM system which performs robust initialization and accurate localization. For initialization, we utilize a matrix factorization-based method. Matrix factorization-based methods require that extracted feature points must be tracked in all used frames. Since consistent tracking is difficult in challenging environments, a geometric interpolation that utilizes epipolar geometry is proposed. For localization, 3D lines are utilized. We propose the use of Plu ̈cker line coordinates to represent geometric information of lines. We also propose orthonormal representation of Plu ̈cker line coordinates and Jacobians of lines for better optimization. Experimental results show that the proposed initialization generates consistent and robust map in linear time with fast convergence even in challenging scenes. And localization using proposed line representations is faster, more accurate and memory efficient than other state-of-the-art methods.
提出了一种具有鲁棒初始化和精确定位的单目间接SLAM系统。对于初始化,我们使用基于矩阵分解的方法。基于矩阵分解的方法要求提取的特征点必须在所有使用的帧中被跟踪。由于在具有挑战性的环境中难以进行一致的跟踪,因此提出了一种利用极几何的几何插值方法。对于定位,使用3D线。我们建议使用Plu ø cker线坐标来表示线的几何信息。为了更好地优化,我们还提出了Plu cker线坐标和雅可比矩阵的正交表示。实验结果表明,该算法在线性时间内生成了一致的鲁棒映射,即使在具有挑战性的场景下也能快速收敛。与其他最先进的方法相比,使用拟议的线表示进行定位更快、更准确、更节省内存。
{"title":"Elaborate Monocular Point and Line SLAM With Robust Initialization","authors":"Sang Jun Lee, S. Hwang","doi":"10.1109/ICCV.2019.00121","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00121","url":null,"abstract":"This paper presents a monocular indirect SLAM system which performs robust initialization and accurate localization. For initialization, we utilize a matrix factorization-based method. Matrix factorization-based methods require that extracted feature points must be tracked in all used frames. Since consistent tracking is difficult in challenging environments, a geometric interpolation that utilizes epipolar geometry is proposed. For localization, 3D lines are utilized. We propose the use of Plu ̈cker line coordinates to represent geometric information of lines. We also propose orthonormal representation of Plu ̈cker line coordinates and Jacobians of lines for better optimization. Experimental results show that the proposed initialization generates consistent and robust map in linear time with fast convergence even in challenging scenes. And localization using proposed line representations is faster, more accurate and memory efficient than other state-of-the-art methods.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"56 1","pages":"1121-1129"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75098151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Align, Attend and Locate: Chest X-Ray Diagnosis via Contrast Induced Attention Network With Limited Supervision 对齐,出席和定位:在有限监督下通过造影剂引起的注意网络进行胸部x线诊断
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.01073
Jingyun Liu, Gangming Zhao, Yu Fei, Ming Zhang, Yizhou Wang, Yizhou Yu
Obstacles facing accurate identification and localization of diseases in chest X-ray images lie in the lack of high-quality images and annotations. In this paper, we propose a Contrast Induced Attention Network (CIA-Net), which exploits the highly structured property of chest X-ray images and localizes diseases via contrastive learning on the aligned positive and negative samples. To force the attention module to focus only on sites of abnormalities, we also introduce a learnable alignment module to adjust all the input images, which eliminates variations of scales, angles, and displacements of X-ray images generated under bad scan conditions. We show that the use of contrastive attention and alignment module allows the model to learn rich identification and localization information using only a small amount of location annotations, resulting in state-of-the-art performance in NIH chest X-ray dataset.
胸部x线图像中疾病的准确识别和定位面临的障碍是缺乏高质量的图像和注释。在本文中,我们提出了一个对比度诱导注意网络(CIA-Net),该网络利用胸部x线图像的高度结构化特性,通过对对齐的正、负样本的对比学习来定位疾病。为了迫使注意力模块只关注异常部位,我们还引入了一个可学习的对齐模块来调整所有输入图像,从而消除了在不良扫描条件下生成的x射线图像的尺度、角度和位移的变化。我们表明,使用对比注意和对齐模块允许模型仅使用少量的位置注释学习丰富的识别和定位信息,从而在NIH胸部x射线数据集中获得最先进的性能。
{"title":"Align, Attend and Locate: Chest X-Ray Diagnosis via Contrast Induced Attention Network With Limited Supervision","authors":"Jingyun Liu, Gangming Zhao, Yu Fei, Ming Zhang, Yizhou Wang, Yizhou Yu","doi":"10.1109/ICCV.2019.01073","DOIUrl":"https://doi.org/10.1109/ICCV.2019.01073","url":null,"abstract":"Obstacles facing accurate identification and localization of diseases in chest X-ray images lie in the lack of high-quality images and annotations. In this paper, we propose a Contrast Induced Attention Network (CIA-Net), which exploits the highly structured property of chest X-ray images and localizes diseases via contrastive learning on the aligned positive and negative samples. To force the attention module to focus only on sites of abnormalities, we also introduce a learnable alignment module to adjust all the input images, which eliminates variations of scales, angles, and displacements of X-ray images generated under bad scan conditions. We show that the use of contrastive attention and alignment module allows the model to learn rich identification and localization information using only a small amount of location annotations, resulting in state-of-the-art performance in NIH chest X-ray dataset.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"177 1","pages":"10631-10640"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78031119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Adaptive Density Map Generation for Crowd Counting 用于人群计数的自适应密度图生成
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00122
Jia Wan, Antoni B. Chan
Crowd counting is an important topic in computer vision due to its practical usage in surveillance systems. The typical design of crowd counting algorithms is divided into two steps. First, the ground-truth density maps of crowd images are generated from the ground-truth dot maps (density map generation), e.g., by convolving with a Gaussian kernel. Second, deep learning models are designed to predict a density map from an input image (density map estimation). Most research efforts have concentrated on the density map estimation problem, while the problem of density map generation has not been adequately explored. In particular, the density map could be considered as an intermediate representation used to train a crowd counting network. In the sense of end-to-end training, the hand-crafted methods used for generating the density maps may not be optimal for the particular network or dataset used. To address this issue, we first show the impact of different density maps and that better ground-truth density maps can be obtained by refining the existing ones using a learned refinement network, which is jointly trained with the counter. Then, we propose an adaptive density map generator, which takes the annotation dot map as input, and learns a density map representation for a counter. The counter and generator are trained jointly within an end-to-end framework. The experiment results on popular counting datasets confirm the effectiveness of the proposed learnable density map representations.
由于人群计数在监控系统中的实际应用,它是计算机视觉中的一个重要课题。人群计数算法的典型设计分为两个步骤。首先,从真实点图(密度图生成)生成人群图像的真实密度图,例如,通过与高斯核卷积。其次,深度学习模型被设计用来从输入图像中预测密度图(密度图估计)。大多数研究都集中在密度图估计问题上,而密度图的生成问题尚未得到充分的探讨。特别是,密度图可以被认为是用于训练人群计数网络的中间表示。在端到端训练的意义上,用于生成密度图的手工方法可能不是所使用的特定网络或数据集的最佳方法。为了解决这个问题,我们首先展示了不同密度图的影响,并且通过使用与计数器联合训练的学习改进网络对现有的地面真密度图进行改进,可以获得更好的地面真密度图。然后,我们提出了一种自适应密度图生成器,它以标注点图作为输入,并学习密度图表示为计数器。计数器和生成器在端到端框架内联合训练。在常用计数数据集上的实验结果证实了所提出的可学习密度图表示的有效性。
{"title":"Adaptive Density Map Generation for Crowd Counting","authors":"Jia Wan, Antoni B. Chan","doi":"10.1109/ICCV.2019.00122","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00122","url":null,"abstract":"Crowd counting is an important topic in computer vision due to its practical usage in surveillance systems. The typical design of crowd counting algorithms is divided into two steps. First, the ground-truth density maps of crowd images are generated from the ground-truth dot maps (density map generation), e.g., by convolving with a Gaussian kernel. Second, deep learning models are designed to predict a density map from an input image (density map estimation). Most research efforts have concentrated on the density map estimation problem, while the problem of density map generation has not been adequately explored. In particular, the density map could be considered as an intermediate representation used to train a crowd counting network. In the sense of end-to-end training, the hand-crafted methods used for generating the density maps may not be optimal for the particular network or dataset used. To address this issue, we first show the impact of different density maps and that better ground-truth density maps can be obtained by refining the existing ones using a learned refinement network, which is jointly trained with the counter. Then, we propose an adaptive density map generator, which takes the annotation dot map as input, and learns a density map representation for a counter. The counter and generator are trained jointly within an end-to-end framework. The experiment results on popular counting datasets confirm the effectiveness of the proposed learnable density map representations.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"167 1","pages":"1130-1139"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80492265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 132
期刊
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1