首页 > 最新文献

2017 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization 用于细粒度视觉分类的层次卷积激活的高阶集成
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.63
Sijia Cai, W. Zuo, Lei Zhang
The success of fine-grained visual categorization (FGVC) extremely relies on the modeling of appearance and interactions of various semantic parts. This makes FGVC very challenging because: (i) part annotation and detection require expert guidance and are very expensive; (ii) parts are of different sizes; and (iii) the part interactions are complex and of higher-order. To address these issues, we propose an end-to-end framework based on higherorder integration of hierarchical convolutional activations for FGVC. By treating the convolutional activations as local descriptors, hierarchical convolutional activations can serve as a representation of local parts from different scales. A polynomial kernel based predictor is proposed to capture higher-order statistics of convolutional activations for modeling part interaction. To model inter-layer part interactions, we extend polynomial predictor to integrate hierarchical activations via kernel fusion. Our work also provides a new perspective for combining convolutional activations from multiple layers. While hypercolumns simply concatenate maps from different layers, and holistically-nested network uses weighted fusion to combine side-outputs, our approach exploits higher-order intra-layer and inter-layer relations for better integration of hierarchical convolutional features. The proposed framework yields more discriminative representation and achieves competitive results on the widely used FGVC datasets.
细粒度视觉分类(FGVC)的成功很大程度上依赖于对各种语义部分的外观和相互作用的建模。这使得FGVC非常具有挑战性,因为:(i)部分注释和检测需要专家指导,并且非常昂贵;(二)零件尺寸不同的;(3)零件相互作用复杂且高阶。为了解决这些问题,我们提出了一个基于FGVC分层卷积激活的高阶集成的端到端框架。通过将卷积激活作为局部描述符,分层卷积激活可以作为来自不同尺度的局部部分的表示。提出了一种基于多项式核的预测器来捕获卷积激活的高阶统计量,用于零件交互建模。为了模拟层间部分的相互作用,我们将多项式预测器扩展到通过核融合集成层次激活。我们的工作也为多层卷积激活的组合提供了一个新的视角。虽然超列简单地连接来自不同层的映射,并且整体嵌套网络使用加权融合来组合侧输出,但我们的方法利用高阶层内和层间关系来更好地集成分层卷积特征。所提出的框架在广泛使用的FGVC数据集上产生了更具歧视性的表示,并获得了竞争性的结果。
{"title":"Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization","authors":"Sijia Cai, W. Zuo, Lei Zhang","doi":"10.1109/ICCV.2017.63","DOIUrl":"https://doi.org/10.1109/ICCV.2017.63","url":null,"abstract":"The success of fine-grained visual categorization (FGVC) extremely relies on the modeling of appearance and interactions of various semantic parts. This makes FGVC very challenging because: (i) part annotation and detection require expert guidance and are very expensive; (ii) parts are of different sizes; and (iii) the part interactions are complex and of higher-order. To address these issues, we propose an end-to-end framework based on higherorder integration of hierarchical convolutional activations for FGVC. By treating the convolutional activations as local descriptors, hierarchical convolutional activations can serve as a representation of local parts from different scales. A polynomial kernel based predictor is proposed to capture higher-order statistics of convolutional activations for modeling part interaction. To model inter-layer part interactions, we extend polynomial predictor to integrate hierarchical activations via kernel fusion. Our work also provides a new perspective for combining convolutional activations from multiple layers. While hypercolumns simply concatenate maps from different layers, and holistically-nested network uses weighted fusion to combine side-outputs, our approach exploits higher-order intra-layer and inter-layer relations for better integration of hierarchical convolutional features. The proposed framework yields more discriminative representation and achieves competitive results on the widely used FGVC datasets.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"25 1","pages":"511-520"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91078721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 163
Monocular Video-Based Trailer Coupler Detection Using Multiplexer Convolutional Neural Network 基于多路卷积神经网络的单目视频拖车耦合器检测
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.584
Yousef Atoum, Joseph Roth, Michael Bliss, Wende Zhang, Xiaoming Liu
This paper presents an automated monocular-camera-based computer vision system for autonomous self-backing-up a vehicle towards a trailer, by continuously estimating the 3D trailer coupler position and feeding it to the vehicle control system, until the alignment of the tow hitch with the trailers coupler. This system is made possible through our proposed distance-driven Multiplexer-CNN method, which selects the most suitable CNN using the estimated coupler-to-vehicle distance. The input of the multiplexer is a group made of a CNN detector, trackers, and 3D localizer. In the CNN detector, we propose a novel algorithm to provide a presence confidence score with each detection. The score reflects the existence of the target object in a region, as well as how accurate is the 2D target detection. We demonstrate the accuracy and efficiency of the system on a large trailer database. Our system achieves an estimation error of 1.4 cm when the ball reaches the coupler, while running at 18.9 FPS on a regular PC.
本文提出了一种基于单目相机的自动计算机视觉系统,该系统通过不断估计3D拖车耦合器的位置并将其馈送给车辆控制系统,直到拖车钩与拖车耦合器对齐。该系统是通过我们提出的距离驱动的multipler -CNN方法实现的,该方法使用估计的耦合器到车辆的距离来选择最合适的CNN。多路复用器的输入是由CNN检测器、跟踪器和3D定位器组成的一组。在CNN检测器中,我们提出了一种新的算法,为每次检测提供存在置信度评分。分数反映了目标物体在一个区域内的存在程度,以及二维目标检测的准确性。我们在一个大型拖车数据库上验证了该系统的准确性和效率。当球到达耦合器时,我们的系统实现了1.4 cm的估计误差,而在普通PC上以18.9 FPS运行。
{"title":"Monocular Video-Based Trailer Coupler Detection Using Multiplexer Convolutional Neural Network","authors":"Yousef Atoum, Joseph Roth, Michael Bliss, Wende Zhang, Xiaoming Liu","doi":"10.1109/ICCV.2017.584","DOIUrl":"https://doi.org/10.1109/ICCV.2017.584","url":null,"abstract":"This paper presents an automated monocular-camera-based computer vision system for autonomous self-backing-up a vehicle towards a trailer, by continuously estimating the 3D trailer coupler position and feeding it to the vehicle control system, until the alignment of the tow hitch with the trailers coupler. This system is made possible through our proposed distance-driven Multiplexer-CNN method, which selects the most suitable CNN using the estimated coupler-to-vehicle distance. The input of the multiplexer is a group made of a CNN detector, trackers, and 3D localizer. In the CNN detector, we propose a novel algorithm to provide a presence confidence score with each detection. The score reflects the existence of the target object in a region, as well as how accurate is the 2D target detection. We demonstrate the accuracy and efficiency of the system on a large trailer database. Our system achieves an estimation error of 1.4 cm when the ball reaches the coupler, while running at 18.9 FPS on a regular PC.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"7 1","pages":"5478-5486"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81858465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Summarization and Classification of Wearable Camera Streams by Learning the Distributions over Deep Features of Out-of-Sample Image Sequences 基于样本外图像序列深度特征分布的可穿戴相机流总结与分类
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.464
Alessandro Penna, Sadegh Mohammadi, N. Jojic, Vittorio Murino
A popular approach to training classifiers of new image classes is to use lower levels of a pre-trained feed-forward neural network and retrain only the top. Thus, most layers simply serve as highly nonlinear feature extractors. While these features were found useful for classifying a variety of scenes and objects, previous work also demonstrated unusual levels of sensitivity to the input especially for images which are veering too far away from the training distribution. This can lead to surprising results as an imperceptible change in an image can be enough to completely change the predicted class. This occurs in particular in applications involving personal data, typically acquired with wearable cameras (e.g., visual lifelogs), where the problem is also made more complex by the dearth of new labeled training data that make supervised learning with deep models difficult. To alleviate these problems, in this paper we propose a new generative model that captures the feature distribution in new data. Its latent space then becomes more representative of the new data, while still retaining the generalization properties. In particular, we use constrained Markov walks over a counting grid for modeling image sequences, which not only yield good latent representations, but allow for excellent classification with only a handful of labeled training examples of the new scenes or objects, a scenario typical in lifelogging applications.
训练新图像分类器的一种流行方法是使用预训练的前馈神经网络的较低层次,只重新训练顶部。因此,大多数层只是作为高度非线性的特征提取器。虽然这些特征被发现对各种场景和对象的分类很有用,但以前的工作也证明了对输入的不同寻常的敏感性,特别是对于偏离训练分布太远的图像。这可能会导致令人惊讶的结果,因为图像中难以察觉的变化足以完全改变预测的类别。这尤其发生在涉及个人数据的应用中,通常是通过可穿戴相机(例如,视觉生活日志)获取的,由于缺乏新的标记训练数据,使得深度模型的监督学习变得困难,问题也变得更加复杂。为了解决这些问题,本文提出了一种新的生成模型来捕捉新数据中的特征分布。然后它的潜在空间变得更能代表新数据,同时仍然保留泛化属性。特别是,我们在计数网格上使用约束马尔可夫行走来对图像序列进行建模,这不仅产生了良好的潜在表示,而且允许仅使用少数标记的新场景或对象的训练示例进行出色的分类,这是生活日志应用中典型的场景。
{"title":"Summarization and Classification of Wearable Camera Streams by Learning the Distributions over Deep Features of Out-of-Sample Image Sequences","authors":"Alessandro Penna, Sadegh Mohammadi, N. Jojic, Vittorio Murino","doi":"10.1109/ICCV.2017.464","DOIUrl":"https://doi.org/10.1109/ICCV.2017.464","url":null,"abstract":"A popular approach to training classifiers of new image classes is to use lower levels of a pre-trained feed-forward neural network and retrain only the top. Thus, most layers simply serve as highly nonlinear feature extractors. While these features were found useful for classifying a variety of scenes and objects, previous work also demonstrated unusual levels of sensitivity to the input especially for images which are veering too far away from the training distribution. This can lead to surprising results as an imperceptible change in an image can be enough to completely change the predicted class. This occurs in particular in applications involving personal data, typically acquired with wearable cameras (e.g., visual lifelogs), where the problem is also made more complex by the dearth of new labeled training data that make supervised learning with deep models difficult. To alleviate these problems, in this paper we propose a new generative model that captures the feature distribution in new data. Its latent space then becomes more representative of the new data, while still retaining the generalization properties. In particular, we use constrained Markov walks over a counting grid for modeling image sequences, which not only yield good latent representations, but allow for excellent classification with only a handful of labeled training examples of the new scenes or objects, a scenario typical in lifelogging applications.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"14 1","pages":"4336-4344"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81916672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
AnnArbor: Approximate Nearest Neighbors Using Arborescence Coding AnnArbor:使用树影编码的近似近邻
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.523
Artem Babenko, V. Lempitsky
To compress large datasets of high-dimensional descriptors, modern quantization schemes learn multiple codebooks and then represent individual descriptors as combinations of codewords. Once the codebooks are learned, these schemes encode descriptors independently. In contrast to that, we present a new coding scheme that arranges dataset descriptors into a set of arborescence graphs, and then encodes non-root descriptors by quantizing their displacements with respect to their parent nodes. By optimizing the structure of arborescences, our coding scheme can decrease the quantization error considerably, while incurring only minimal overhead on the memory footprint and the speed of nearest neighbor search in the compressed dataset compared to the independent quantization. The advantage of the proposed scheme is demonstrated in a series of experiments with datasets of SIFT and deep descriptors.
为了压缩高维描述符的大型数据集,现代量化方案学习多个码本,然后将单个描述符表示为码字的组合。一旦了解了码本,这些方案就独立地对描述符进行编码。与此相反,我们提出了一种新的编码方案,该方案将数据集描述符排列成一组树形图,然后通过量化它们相对于父节点的位移来编码非根描述符。通过优化树形序列的结构,我们的编码方案可以大大减少量化误差,同时与独立量化相比,压缩数据集中的内存占用和最近邻搜索速度只产生最小的开销。在SIFT和深度描述符的数据集上进行了一系列实验,证明了该方法的优越性。
{"title":"AnnArbor: Approximate Nearest Neighbors Using Arborescence Coding","authors":"Artem Babenko, V. Lempitsky","doi":"10.1109/ICCV.2017.523","DOIUrl":"https://doi.org/10.1109/ICCV.2017.523","url":null,"abstract":"To compress large datasets of high-dimensional descriptors, modern quantization schemes learn multiple codebooks and then represent individual descriptors as combinations of codewords. Once the codebooks are learned, these schemes encode descriptors independently. In contrast to that, we present a new coding scheme that arranges dataset descriptors into a set of arborescence graphs, and then encodes non-root descriptors by quantizing their displacements with respect to their parent nodes. By optimizing the structure of arborescences, our coding scheme can decrease the quantization error considerably, while incurring only minimal overhead on the memory footprint and the speed of nearest neighbor search in the compressed dataset compared to the independent quantization. The advantage of the proposed scheme is demonstrated in a series of experiments with datasets of SIFT and deep descriptors.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"121 1","pages":"4895-4903"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76689321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Multi-label Learning of Part Detectors for Heavily Occluded Pedestrian Detection 重度遮挡行人检测中局部检测器的多标签学习
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.377
Chunluan Zhou, Junsong Yuan
Detecting pedestrians that are partially occluded remains a challenging problem due to variations and uncertainties of partial occlusion patterns. Following a commonly used framework of handling partial occlusions by part detection, we propose a multi-label learning approach to jointly learn part detectors to capture partial occlusion patterns. The part detectors share a set of decision trees via boosting to exploit part correlations and also reduce the computational cost of applying these part detectors. The learned decision trees capture the overall distribution of all the parts. When used as a pedestrian detector individually, our part detectors learned jointly show better performance than their counterparts learned separately in different occlusion situations. The learned part detectors can be further integrated to better detect partially occluded pedestrians. Experiments on the Caltech dataset show state-of-the-art performance of our approach for detecting heavily occluded pedestrians.
由于部分遮挡模式的变化和不确定性,检测部分遮挡的行人仍然是一个具有挑战性的问题。根据常用的通过部分检测处理部分遮挡的框架,我们提出了一种多标签学习方法来联合学习部分检测器以捕获部分遮挡模式。零件检测器通过增强来共享一组决策树,利用零件的相关性,降低了应用这些零件检测器的计算成本。学习的决策树捕获所有部件的总体分布。当单独用作行人检测器时,我们联合学习的部分检测器在不同遮挡情况下比单独学习的部分检测器表现出更好的性能。将学习到的部分检测器进一步集成,可以更好地检测部分遮挡的行人。在加州理工学院数据集上的实验表明,我们的方法在检测严重遮挡的行人方面具有最先进的性能。
{"title":"Multi-label Learning of Part Detectors for Heavily Occluded Pedestrian Detection","authors":"Chunluan Zhou, Junsong Yuan","doi":"10.1109/ICCV.2017.377","DOIUrl":"https://doi.org/10.1109/ICCV.2017.377","url":null,"abstract":"Detecting pedestrians that are partially occluded remains a challenging problem due to variations and uncertainties of partial occlusion patterns. Following a commonly used framework of handling partial occlusions by part detection, we propose a multi-label learning approach to jointly learn part detectors to capture partial occlusion patterns. The part detectors share a set of decision trees via boosting to exploit part correlations and also reduce the computational cost of applying these part detectors. The learned decision trees capture the overall distribution of all the parts. When used as a pedestrian detector individually, our part detectors learned jointly show better performance than their counterparts learned separately in different occlusion situations. The learned part detectors can be further integrated to better detect partially occluded pedestrians. Experiments on the Caltech dataset show state-of-the-art performance of our approach for detecting heavily occluded pedestrians.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"177 1","pages":"3506-3515"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77369814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 112
Transferring Objects: Joint Inference of Container and Human Pose 转移对象:容器与人体姿态的联合推理
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.319
Hanqing Wang, Wei Liang, L. Yu
Transferring objects from one place to another place is a common task performed by human in daily life. During this process, it is usually intuitive for humans to choose an object as a proper container and to use an efficient pose to carry objects; yet, it is non-trivial for current computer vision and machine learning algorithms. In this paper, we propose an approach to jointly infer container and human pose for transferring objects by minimizing the costs associated both object and pose candidates. Our approach predicts which object to choose as a container while reasoning about how humans interact with physical surroundings to accomplish the task of transferring objects given visual input. In the learning phase, the presented method learns how humans make rational choices of containers and poses for transferring different objects, as well as the physical quantities required by the transfer task (e.g., compatibility between container and containee, energy cost of carrying pose) via a structured learning approach. In the inference phase, given a scanned 3D scene with different object candidates and a dictionary of human poses, our approach infers the best object as a container together with human pose for transferring a given object.
将物品从一个地方转移到另一个地方是人类日常生活中常见的任务。在这个过程中,人类通常会直观地选择一个物体作为合适的容器,并使用有效的姿势来携带物体;然而,对于当前的计算机视觉和机器学习算法来说,这是非常重要的。在本文中,我们提出了一种通过最小化对象和姿态候选相关成本来联合推断容器和人体姿态的方法。我们的方法预测选择哪个物体作为容器,同时推理人类如何与物理环境相互作用,以完成给定视觉输入的物体转移任务。在学习阶段,该方法通过结构化的学习方法学习人类如何理性选择容器和姿势来转移不同的物体,以及转移任务所需的物理量(如容器和被容器之间的兼容性,搬运姿势的能量成本)。在推理阶段,给定具有不同候选对象和人体姿势字典的扫描3D场景,我们的方法推断出最佳对象作为容器以及用于转移给定对象的人体姿势。
{"title":"Transferring Objects: Joint Inference of Container and Human Pose","authors":"Hanqing Wang, Wei Liang, L. Yu","doi":"10.1109/ICCV.2017.319","DOIUrl":"https://doi.org/10.1109/ICCV.2017.319","url":null,"abstract":"Transferring objects from one place to another place is a common task performed by human in daily life. During this process, it is usually intuitive for humans to choose an object as a proper container and to use an efficient pose to carry objects; yet, it is non-trivial for current computer vision and machine learning algorithms. In this paper, we propose an approach to jointly infer container and human pose for transferring objects by minimizing the costs associated both object and pose candidates. Our approach predicts which object to choose as a container while reasoning about how humans interact with physical surroundings to accomplish the task of transferring objects given visual input. In the learning phase, the presented method learns how humans make rational choices of containers and poses for transferring different objects, as well as the physical quantities required by the transfer task (e.g., compatibility between container and containee, energy cost of carrying pose) via a structured learning approach. In the inference phase, given a scanned 3D scene with different object candidates and a dictionary of human poses, our approach infers the best object as a container together with human pose for transferring a given object.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"156 1","pages":"2952-2960"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77483028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Multi-label Image Recognition by Recurrently Discovering Attentional Regions 递归发现注意区域的多标签图像识别
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.58
Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, Liang Lin
This paper proposes a novel deep architecture to address multi-label image recognition, a fundamental and practical task towards general visual understanding. Current solutions for this task usually rely on an extra step of extracting hypothesis regions (i.e., region proposals), resulting in redundant computation and sub-optimal performance. In this work, we achieve the interpretable and contextualized multi-label image classification by developing a recurrent memorized-attention module. This module consists of two alternately performed components: i) a spatial transformer layer to locate attentional regions from the convolutional feature maps in a region-proposal-free way and ii) an LSTM (Long-Short Term Memory) sub-network to sequentially predict semantic labeling scores on the located regions while capturing the global dependencies of these regions. The LSTM also output the parameters for computing the spatial transformer. On large-scale benchmarks of multi-label image classification (e.g., MS-COCO and PASCAL VOC 07), our approach demonstrates superior performances over other existing state-of-the-arts in both accuracy and efficiency.
本文提出了一种新的深度架构来解决多标签图像识别,这是实现一般视觉理解的基础和实际任务。目前该任务的解决方案通常依赖于提取假设区域(即区域建议)的额外步骤,导致冗余计算和次优性能。在这项工作中,我们通过开发一个循环记忆注意模块来实现可解释和上下文化的多标签图像分类。该模块由两个交替执行的组件组成:i)空间转换层,用于以无区域提议的方式从卷积特征映射中定位注意区域;ii) LSTM(长短期记忆)子网络,用于顺序预测所定位区域上的语义标记分数,同时捕获这些区域的全局依赖关系。LSTM还输出用于计算空间变压器的参数。在多标签图像分类的大规模基准测试中(例如MS-COCO和PASCAL VOC 07),我们的方法在准确性和效率方面都优于其他现有的最先进的方法。
{"title":"Multi-label Image Recognition by Recurrently Discovering Attentional Regions","authors":"Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, Liang Lin","doi":"10.1109/ICCV.2017.58","DOIUrl":"https://doi.org/10.1109/ICCV.2017.58","url":null,"abstract":"This paper proposes a novel deep architecture to address multi-label image recognition, a fundamental and practical task towards general visual understanding. Current solutions for this task usually rely on an extra step of extracting hypothesis regions (i.e., region proposals), resulting in redundant computation and sub-optimal performance. In this work, we achieve the interpretable and contextualized multi-label image classification by developing a recurrent memorized-attention module. This module consists of two alternately performed components: i) a spatial transformer layer to locate attentional regions from the convolutional feature maps in a region-proposal-free way and ii) an LSTM (Long-Short Term Memory) sub-network to sequentially predict semantic labeling scores on the located regions while capturing the global dependencies of these regions. The LSTM also output the parameters for computing the spatial transformer. On large-scale benchmarks of multi-label image classification (e.g., MS-COCO and PASCAL VOC 07), our approach demonstrates superior performances over other existing state-of-the-arts in both accuracy and efficiency.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"18 1","pages":"464-472"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79276494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 238
Region-Based Correspondence Between 3D Shapes via Spatially Smooth Biclustering 基于空间平滑双聚类的三维形状区域对应
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.457
M. Denitto, S. Melzi, M. Bicego, U. Castellani, A. Farinelli, Mário A. T. Figueiredo, Yanir Kleiman, M. Ovsjanikov
Region-based correspondence (RBC) is a highly relevant and non-trivial computer vision problem. Given two 3D shapes, RBC seeks segments/regions on these shapes that can be reliably put in correspondence. The problem thus consists both in finding the regions and determining the correspondences between them. This problem statement is similar to that of “biclustering ”, implying that RBC can be cast as a biclustering problem. Here, we exploit this implication by tackling RBC via a novel biclustering approach, called S4B (spatially smooth spike and slab biclustering), which: (i) casts the problem in a probabilistic low-rank matrix factorization perspective; (ii) uses a spike and slab prior to induce sparsity; (iii) is enriched with a spatial smoothness prior, based on geodesic distances, encouraging nearby vertices to belong to the same bicluster. This type of spatial prior cannot be used in classical biclustering techniques. We test the proposed approach on the FAUST dataset, outperforming both state-of-the-art RBC techniques and classical biclustering methods.
基于区域的对应(RBC)是一个高度相关且重要的计算机视觉问题。给定两个3D形状,RBC在这些形状上寻找可以可靠地对应的片段/区域。因此,问题既在于找到这些区域,又在于确定它们之间的对应关系。这个问题陈述类似于“双聚类”,这意味着RBC可以被视为一个双聚类问题。在这里,我们利用这一含义,通过一种新的双聚类方法来处理RBC,称为S4B(空间平滑尖峰和平板双聚类),它:(i)在概率低秩矩阵分解的角度来处理问题;(ii)在诱导稀疏之前使用尖钉和板;(iii)基于测地线距离丰富了空间平滑先验,鼓励附近的顶点属于相同的双聚类。这种类型的空间先验不能在经典的双聚类技术中使用。我们在FAUST数据集上测试了提出的方法,优于最先进的RBC技术和经典的双聚类方法。
{"title":"Region-Based Correspondence Between 3D Shapes via Spatially Smooth Biclustering","authors":"M. Denitto, S. Melzi, M. Bicego, U. Castellani, A. Farinelli, Mário A. T. Figueiredo, Yanir Kleiman, M. Ovsjanikov","doi":"10.1109/ICCV.2017.457","DOIUrl":"https://doi.org/10.1109/ICCV.2017.457","url":null,"abstract":"Region-based correspondence (RBC) is a highly relevant and non-trivial computer vision problem. Given two 3D shapes, RBC seeks segments/regions on these shapes that can be reliably put in correspondence. The problem thus consists both in finding the regions and determining the correspondences between them. This problem statement is similar to that of “biclustering ”, implying that RBC can be cast as a biclustering problem. Here, we exploit this implication by tackling RBC via a novel biclustering approach, called S4B (spatially smooth spike and slab biclustering), which: (i) casts the problem in a probabilistic low-rank matrix factorization perspective; (ii) uses a spike and slab prior to induce sparsity; (iii) is enriched with a spatial smoothness prior, based on geodesic distances, encouraging nearby vertices to belong to the same bicluster. This type of spatial prior cannot be used in classical biclustering techniques. We test the proposed approach on the FAUST dataset, outperforming both state-of-the-art RBC techniques and classical biclustering methods.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"93 1","pages":"4270-4279"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79433023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
2D-Driven 3D Object Detection in RGB-D Images RGB-D图像中2d驱动的3D目标检测
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.495
Jean Lahoud, Bernard Ghanem
In this paper, we present a technique that places 3D bounding boxes around objects in an RGB-D scene. Our approach makes best use of the 2D information to quickly reduce the search space in 3D, benefiting from state-of-the-art 2D object detection techniques. We then use the 3D information to orient, place, and score bounding boxes around objects. We independently estimate the orientation for every object, using previous techniques that utilize normal information. Object locations and sizes in 3D are learned using a multilayer perceptron (MLP). In the final step, we refine our detections based on object class relations within a scene. When compared to state-of-the-art detection methods that operate almost entirely in the sparse 3D domain, extensive experiments on the well-known SUN RGB-D dataset [29] show that our proposed method is much faster (4.1s per image) in detecting 3D objects in RGB-D images and performs better (3 mAP higher) than the state-of-the-art method that is 4.7 times slower and comparably to the method that is two orders of magnitude slower. This work hints at the idea that 2D-driven object detection in 3D should be further explored, especially in cases where the 3D input is sparse.
在本文中,我们提出了一种在RGB-D场景中围绕对象放置3D边界框的技术。我们的方法充分利用了二维信息,快速减少了三维的搜索空间,受益于最先进的二维目标检测技术。然后,我们使用3D信息来定位、放置和评分对象周围的边界框。我们独立地估计每个对象的方向,使用以前的技术,利用正常的信息。使用多层感知器(MLP)学习三维物体的位置和大小。在最后一步,我们基于场景中的对象类关系来改进我们的检测。与几乎完全在稀疏3D域中操作的最先进的检测方法相比,在著名的SUN RGB-D数据集上进行的大量实验[29]表明,我们提出的方法在检测RGB-D图像中的3D物体时要快得多(每幅图像4.1s),并且比最先进的方法性能更好(3 mAP更高),最先进的方法速度慢4.7倍,与慢两个数量级的方法相比。这项工作暗示了3D中2d驱动的物体检测应该进一步探索的想法,特别是在3D输入稀疏的情况下。
{"title":"2D-Driven 3D Object Detection in RGB-D Images","authors":"Jean Lahoud, Bernard Ghanem","doi":"10.1109/ICCV.2017.495","DOIUrl":"https://doi.org/10.1109/ICCV.2017.495","url":null,"abstract":"In this paper, we present a technique that places 3D bounding boxes around objects in an RGB-D scene. Our approach makes best use of the 2D information to quickly reduce the search space in 3D, benefiting from state-of-the-art 2D object detection techniques. We then use the 3D information to orient, place, and score bounding boxes around objects. We independently estimate the orientation for every object, using previous techniques that utilize normal information. Object locations and sizes in 3D are learned using a multilayer perceptron (MLP). In the final step, we refine our detections based on object class relations within a scene. When compared to state-of-the-art detection methods that operate almost entirely in the sparse 3D domain, extensive experiments on the well-known SUN RGB-D dataset [29] show that our proposed method is much faster (4.1s per image) in detecting 3D objects in RGB-D images and performs better (3 mAP higher) than the state-of-the-art method that is 4.7 times slower and comparably to the method that is two orders of magnitude slower. This work hints at the idea that 2D-driven object detection in 3D should be further explored, especially in cases where the 3D input is sparse.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"12 1","pages":"4632-4640"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73220590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 146
Mutual Enhancement for Detection of Multiple Logos in Sports Videos 体育视频中多个logo检测的相互增强
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.519
Yuan Liao, Xiaoqing Lu, Chengcui Zhang, Yongtao Wang, Zhi Tang
Detecting logo frequency and duration in sports videos provides sponsors an effective way to evaluate their advertising efforts. However, general-purposed object detection methods cannot address all the challenges in sports videos. In this paper, we propose a mutual-enhanced approach that can improve the detection of a logo through the information obtained from other simultaneously occurred logos. In a Fast-RCNN-based framework, we first introduce a homogeneity-enhanced re-ranking method by analyzing the characteristics of homogeneous logos in each frame, including type repetition, color consistency, and mutual exclusion. Different from conventional enhance mechanism that improves the weak proposals with the dominant proposals, our mutual method can also enhance the relatively significant proposals with weak proposals. Mutual enhancement is also included in our frame propagation mechanism that improves logo detection by utilizing the continuity of logos across frames. We use a tennis video dataset and an associated logo collection for detection evaluation. Experiments show that the proposed method outperforms existing methods with a higher accuracy.
检测体育视频中标识的出现频率和持续时间为赞助商提供了一种评估其广告效果的有效方法。然而,通用的目标检测方法并不能解决体育视频中的所有问题。在本文中,我们提出了一种相互增强的方法,可以通过从其他同时出现的徽标中获得的信息来提高对徽标的检测。在基于fast - rcnn的框架中,我们首先通过分析每帧中同质标识的特征,包括类型重复、颜色一致性和互斥性,引入了一种同质增强的重新排序方法。与传统的以优势提案提升弱提案的增强机制不同,我们的互增强方法也可以通过弱提案提升相对重要的提案。我们的帧传播机制中还包括相互增强,通过利用跨帧的徽标连续性来改进徽标检测。我们使用网球视频数据集和相关的徽标集合进行检测评估。实验表明,该方法优于现有方法,具有更高的精度。
{"title":"Mutual Enhancement for Detection of Multiple Logos in Sports Videos","authors":"Yuan Liao, Xiaoqing Lu, Chengcui Zhang, Yongtao Wang, Zhi Tang","doi":"10.1109/ICCV.2017.519","DOIUrl":"https://doi.org/10.1109/ICCV.2017.519","url":null,"abstract":"Detecting logo frequency and duration in sports videos provides sponsors an effective way to evaluate their advertising efforts. However, general-purposed object detection methods cannot address all the challenges in sports videos. In this paper, we propose a mutual-enhanced approach that can improve the detection of a logo through the information obtained from other simultaneously occurred logos. In a Fast-RCNN-based framework, we first introduce a homogeneity-enhanced re-ranking method by analyzing the characteristics of homogeneous logos in each frame, including type repetition, color consistency, and mutual exclusion. Different from conventional enhance mechanism that improves the weak proposals with the dominant proposals, our mutual method can also enhance the relatively significant proposals with weak proposals. Mutual enhancement is also included in our frame propagation mechanism that improves logo detection by utilizing the continuity of logos across frames. We use a tennis video dataset and an associated logo collection for detection evaluation. Experiments show that the proposed method outperforms existing methods with a higher accuracy.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"25 1","pages":"4856-4865"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73657557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
期刊
2017 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1