首页 > 最新文献

2015 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
Category-Blind Human Action Recognition: A Practical Recognition System 类别盲人类动作识别:一种实用的识别系统
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.505
Wenbo Li, Longyin Wen, M. Chuah, Siwei Lyu
Existing human action recognition systems for 3D sequences obtained from the depth camera are designed to cope with only one action category, either single-person action or two-person interaction, and are difficult to be extended to scenarios where both action categories co-exist. In this paper, we propose the category-blind human recognition method (CHARM) which can recognize a human action without making assumptions of the action category. In our CHARM approach, we represent a human action (either a single-person action or a two-person interaction) class using a co-occurrence of motion primitives. Subsequently, we classify an action instance based on matching its motion primitive co-occurrence patterns to each class representation. The matching task is formulated as maximum clique problems. We conduct extensive evaluations of CHARM using three datasets for single-person actions, two-person interactions, and their mixtures. Experimental results show that CHARM performs favorably when compared with several state-of-the-art single-person action and two-person interaction based methods without making explicit assumptions of action category.
现有的深度相机三维序列人体动作识别系统只能处理单一动作类别,即单人动作或两人交互,难以扩展到两种动作类别共存的场景。在本文中,我们提出了一种无需假设动作类别就能识别人类动作的类别盲人类识别方法(CHARM)。在我们的CHARM方法中,我们使用动作原语的共现来表示人类动作(单人动作或两人交互)类。随后,我们将动作实例的运动原语共现模式与每个类表示相匹配,从而对其进行分类。将匹配任务表述为最大团问题。我们使用三个数据集对CHARM进行了广泛的评估,包括单人行动、两人互动及其混合。实验结果表明,在没有明确动作类别假设的情况下,与几种最先进的基于单人动作和双人交互的方法相比,CHARM具有良好的性能。
{"title":"Category-Blind Human Action Recognition: A Practical Recognition System","authors":"Wenbo Li, Longyin Wen, M. Chuah, Siwei Lyu","doi":"10.1109/ICCV.2015.505","DOIUrl":"https://doi.org/10.1109/ICCV.2015.505","url":null,"abstract":"Existing human action recognition systems for 3D sequences obtained from the depth camera are designed to cope with only one action category, either single-person action or two-person interaction, and are difficult to be extended to scenarios where both action categories co-exist. In this paper, we propose the category-blind human recognition method (CHARM) which can recognize a human action without making assumptions of the action category. In our CHARM approach, we represent a human action (either a single-person action or a two-person interaction) class using a co-occurrence of motion primitives. Subsequently, we classify an action instance based on matching its motion primitive co-occurrence patterns to each class representation. The matching task is formulated as maximum clique problems. We conduct extensive evaluations of CHARM using three datasets for single-person actions, two-person interactions, and their mixtures. Experimental results show that CHARM performs favorably when compared with several state-of-the-art single-person action and two-person interaction based methods without making explicit assumptions of action category.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"15 1","pages":"4444-4452"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88804091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Multi-View Complementary Hash Tables for Nearest Neighbor Search 多视图互补哈希表的最近邻搜索
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.132
Xianglong Liu, Lei Huang, Cheng Deng, Jiwen Lu, B. Lang
Recent years have witnessed the success of hashing techniques in fast nearest neighbor search. In practice many applications (eg., visual search, object detection, image matching, etc.) have enjoyed the benefits of complementary hash tables and information fusion over multiple views. However, most of prior research mainly focused on compact hash code cleaning, and rare work studies how to build multiple complementary hash tables, much less to adaptively integrate information stemming from multiple views. In this paper we first present a novel multi-view complementary hash table method that learns complementarity hash tables from the data with multiple views. For single multi-view table, using exemplar based feature fusion, we approximate the inherent data similarities with a low-rank matrix, and learn discriminative hash functions in an efficient way. To build complementary tables and meanwhile maintain scalable training and fast out-of-sample extension, an exemplar reweighting scheme is introduced to update the induced low-rank similarity in the sequential table construction framework, which indeed brings mutual benefits between tables by placing greater importance on exemplars shared by mis-separated neighbors. Extensive experiments on three large-scale image datasets demonstrate that the proposed method significantly outperforms various naive solutions and state-of-the-art multi-table methods.
近年来,哈希技术在快速近邻搜索中取得了成功。在实践中,许多应用(例如:(如视觉搜索、目标检测、图像匹配等)已经享受到了互补哈希表和多视图信息融合的好处。然而,以往的研究大多集中在紧凑的哈希码清理上,很少有研究如何构建多个互补哈希表,而很少有研究如何自适应集成来自多个视图的信息。本文首先提出了一种新的多视图互补哈希表方法,该方法从具有多视图的数据中学习互补哈希表。对于单个多视图表,采用基于样本的特征融合方法,用低秩矩阵近似数据固有的相似度,并有效地学习判别哈希函数。为了构建互补表,同时保持可扩展的训练和快速的样本外扩展,在顺序表构建框架中引入了一种样本重加权方案来更新诱导的低秩相似度,该方案更重视错分离邻居共享的样本,从而实现表间的互利。在三个大规模图像数据集上的大量实验表明,该方法明显优于各种朴素解决方案和最先进的多表方法。
{"title":"Multi-View Complementary Hash Tables for Nearest Neighbor Search","authors":"Xianglong Liu, Lei Huang, Cheng Deng, Jiwen Lu, B. Lang","doi":"10.1109/ICCV.2015.132","DOIUrl":"https://doi.org/10.1109/ICCV.2015.132","url":null,"abstract":"Recent years have witnessed the success of hashing techniques in fast nearest neighbor search. In practice many applications (eg., visual search, object detection, image matching, etc.) have enjoyed the benefits of complementary hash tables and information fusion over multiple views. However, most of prior research mainly focused on compact hash code cleaning, and rare work studies how to build multiple complementary hash tables, much less to adaptively integrate information stemming from multiple views. In this paper we first present a novel multi-view complementary hash table method that learns complementarity hash tables from the data with multiple views. For single multi-view table, using exemplar based feature fusion, we approximate the inherent data similarities with a low-rank matrix, and learn discriminative hash functions in an efficient way. To build complementary tables and meanwhile maintain scalable training and fast out-of-sample extension, an exemplar reweighting scheme is introduced to update the induced low-rank similarity in the sequential table construction framework, which indeed brings mutual benefits between tables by placing greater importance on exemplars shared by mis-separated neighbors. Extensive experiments on three large-scale image datasets demonstrate that the proposed method significantly outperforms various naive solutions and state-of-the-art multi-table methods.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"120 1","pages":"1107-1115"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77405825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Fine-Grained Change Detection of Misaligned Scenes with Varied Illuminations 不同光照条件下不对齐场景的细粒度变化检测
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.149
Wei Feng, Fei-Peng Tian, Qian Zhang, N. Zhang, Liang Wan, Ji-zhou Sun
Detecting fine-grained subtle changes among a scene is critically important in practice. Previous change detection methods, focusing on detecting large-scale significant changes, cannot do this well. This paper proposes a feasible end-to-end approach to this challenging problem. We start from active camera relocation that quickly relocates camera to nearly the same pose and position of the last time observation. To guarantee detection sensitivity and accuracy of minute changes, in an observation, we capture a group of images under multiple illuminations, which need only to be roughly aligned to the last time lighting conditions. Given two times observations, we formulate fine-grained change detection as a joint optimization problem of three related factors, i.e., normal-aware lighting difference, camera geometry correction flow, and real scene change mask. We solve the three factors in a coarse-to-fine manner and achieve reliable change decision by rank minimization. We build three real-world datasets to benchmark fine-grained change detection of misaligned scenes under varied multiple lighting conditions. Extensive experiments show the superior performance of our approach over state-of-the-art change detection methods and its ability to distinguish real scene changes from false ones caused by lighting variations.
在实践中,检测场景中的细微变化是至关重要的。以前的变更检测方法侧重于检测大规模的重大变更,不能很好地做到这一点。本文提出了一种可行的端到端方法来解决这个具有挑战性的问题。我们从主动相机重新定位开始,快速将相机重新定位到与上次观察几乎相同的姿势和位置。为了保证微小变化的检测灵敏度和准确性,在一次观测中,我们在多个光照条件下捕获一组图像,这些图像只需要与上一次光照条件大致对齐。在两次观测的情况下,我们将细粒度变化检测定义为法向感知光照差、相机几何校正流和真实场景变化掩模三个相关因素的联合优化问题。我们用从粗到精的方法求解这三个因素,并通过秩最小化实现可靠的变更决策。我们建立了三个真实世界的数据集,在不同的多种照明条件下对不同的场景进行细粒度的变化检测。大量的实验表明,我们的方法优于最先进的变化检测方法,并且能够区分由照明变化引起的真实场景变化和虚假场景变化。
{"title":"Fine-Grained Change Detection of Misaligned Scenes with Varied Illuminations","authors":"Wei Feng, Fei-Peng Tian, Qian Zhang, N. Zhang, Liang Wan, Ji-zhou Sun","doi":"10.1109/ICCV.2015.149","DOIUrl":"https://doi.org/10.1109/ICCV.2015.149","url":null,"abstract":"Detecting fine-grained subtle changes among a scene is critically important in practice. Previous change detection methods, focusing on detecting large-scale significant changes, cannot do this well. This paper proposes a feasible end-to-end approach to this challenging problem. We start from active camera relocation that quickly relocates camera to nearly the same pose and position of the last time observation. To guarantee detection sensitivity and accuracy of minute changes, in an observation, we capture a group of images under multiple illuminations, which need only to be roughly aligned to the last time lighting conditions. Given two times observations, we formulate fine-grained change detection as a joint optimization problem of three related factors, i.e., normal-aware lighting difference, camera geometry correction flow, and real scene change mask. We solve the three factors in a coarse-to-fine manner and achieve reliable change decision by rank minimization. We build three real-world datasets to benchmark fine-grained change detection of misaligned scenes under varied multiple lighting conditions. Extensive experiments show the superior performance of our approach over state-of-the-art change detection methods and its ability to distinguish real scene changes from false ones caused by lighting variations.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"20 1","pages":"1260-1268"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86937170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Multiple Granularity Descriptors for Fine-Grained Categorization 用于细粒度分类的多粒度描述符
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.276
Dequan Wang, Zhiqiang Shen, Jie Shao, Wei Zhang, X. Xue, Zeyu Zhang
Fine-grained categorization, which aims to distinguish subordinate-level categories such as bird species or dog breeds, is an extremely challenging task. This is due to two main issues: how to localize discriminative regions for recognition and how to learn sophisticated features for representation. Neither of them is easy to handle if there is insufficient labeled data. We leverage the fact that a subordinate-level object already has other labels in its ontology tree. These "free" labels can be used to train a series of CNN-based classifiers, each specialized at one grain level. The internal representations of these networks have different region of interests, allowing the construction of multi-grained descriptors that encode informative and discriminative features covering all the grain levels. Our multiple granularity framework can be learned with the weakest supervision, requiring only image-level label and avoiding the use of labor-intensive bounding box or part annotations. Experimental results on three challenging fine-grained image datasets demonstrate that our approach outperforms state-of-the-art algorithms, including those requiring strong labels.
细粒度分类是一项极具挑战性的任务,其目的是区分从属级别的类别,如鸟类或狗的品种。这主要是由于两个问题:如何定位识别的判别区域,以及如何学习复杂的特征来表示。如果没有足够的标记数据,这两种方法都不容易处理。我们利用了从属级对象在其本体树中已经有其他标签的事实。这些“免费”标签可以用来训练一系列基于cnn的分类器,每个分类器在一个粒度级别上进行专业化。这些网络的内部表示具有不同的兴趣区域,允许构建多粒度描述符,这些描述符编码涵盖所有粒度级别的信息和判别特征。我们的多粒度框架可以在最弱的监督下学习,只需要图像级别的标签,避免使用劳动密集型的边界框或部分注释。在三个具有挑战性的细粒度图像数据集上的实验结果表明,我们的方法优于最先进的算法,包括那些需要强标签的算法。
{"title":"Multiple Granularity Descriptors for Fine-Grained Categorization","authors":"Dequan Wang, Zhiqiang Shen, Jie Shao, Wei Zhang, X. Xue, Zeyu Zhang","doi":"10.1109/ICCV.2015.276","DOIUrl":"https://doi.org/10.1109/ICCV.2015.276","url":null,"abstract":"Fine-grained categorization, which aims to distinguish subordinate-level categories such as bird species or dog breeds, is an extremely challenging task. This is due to two main issues: how to localize discriminative regions for recognition and how to learn sophisticated features for representation. Neither of them is easy to handle if there is insufficient labeled data. We leverage the fact that a subordinate-level object already has other labels in its ontology tree. These \"free\" labels can be used to train a series of CNN-based classifiers, each specialized at one grain level. The internal representations of these networks have different region of interests, allowing the construction of multi-grained descriptors that encode informative and discriminative features covering all the grain levels. Our multiple granularity framework can be learned with the weakest supervision, requiring only image-level label and avoiding the use of labor-intensive bounding box or part annotations. Experimental results on three challenging fine-grained image datasets demonstrate that our approach outperforms state-of-the-art algorithms, including those requiring strong labels.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"58 1","pages":"2399-2406"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85619473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 199
Automatic Thumbnail Generation Based on Visual Representativeness and Foreground Recognizability 基于视觉代表性和前景可识别性的缩略图自动生成
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.37
Jingwei Huang, Huarong Chen, Bin Wang, Stephen Lin
We present an automatic thumbnail generation technique based on two essential considerations: how well they visually represent the original photograph, and how well the foreground can be recognized after the cropping and downsizing steps of thumbnailing. These factors, while important for the image indexing purpose of thumbnails, have largely been ignored in previous methods, which instead are designed to highlight salient content while disregarding the effects of downsizing. We propose a set of image features for modeling these two considerations of thumbnails, and learn how to balance their relative effects on thumbnail generation through training on image pairs composed of photographs and their corresponding thumbnails created by an expert photographer. Experiments show the effectiveness of this approach on a variety of images, as well as its advantages over related techniques.
我们提出了一种基于两个基本考虑的自动缩略图生成技术:它们在视觉上代表原始照片的程度,以及缩略图裁剪和缩小步骤后前景的识别程度。这些因素虽然对缩略图的图像索引目的很重要,但在以前的方法中很大程度上被忽略了,这些方法的目的是突出突出的内容,而忽略缩小尺寸的影响。我们提出了一组图像特征来对缩略图的这两种考虑进行建模,并通过训练由照片和专业摄影师创建的相应缩略图组成的图像对来学习如何平衡它们对缩略图生成的相对影响。实验证明了该方法在多种图像上的有效性,以及相对于相关技术的优势。
{"title":"Automatic Thumbnail Generation Based on Visual Representativeness and Foreground Recognizability","authors":"Jingwei Huang, Huarong Chen, Bin Wang, Stephen Lin","doi":"10.1109/ICCV.2015.37","DOIUrl":"https://doi.org/10.1109/ICCV.2015.37","url":null,"abstract":"We present an automatic thumbnail generation technique based on two essential considerations: how well they visually represent the original photograph, and how well the foreground can be recognized after the cropping and downsizing steps of thumbnailing. These factors, while important for the image indexing purpose of thumbnails, have largely been ignored in previous methods, which instead are designed to highlight salient content while disregarding the effects of downsizing. We propose a set of image features for modeling these two considerations of thumbnails, and learn how to balance their relative effects on thumbnail generation through training on image pairs composed of photographs and their corresponding thumbnails created by an expert photographer. Experiments show the effectiveness of this approach on a variety of images, as well as its advantages over related techniques.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"14 1","pages":"253-261"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91175879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Unsupervised Domain Adaptation for Zero-Shot Learning 零射击学习的无监督域自适应
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.282
Elyor Kodirov, T. Xiang, Zhenyong Fu, S. Gong
Zero-shot learning (ZSL) can be considered as a special case of transfer learning where the source and target domains have different tasks/label spaces and the target domain is unlabelled, providing little guidance for the knowledge transfer. A ZSL method typically assumes that the two domains share a common semantic representation space, where a visual feature vector extracted from an image/video can be projected/embedded using a projection function. Existing approaches learn the projection function from the source domain and apply it without adaptation to the target domain. They are thus based on naive knowledge transfer and the learned projections are prone to the domain shift problem. In this paper a novel ZSL method is proposed based on unsupervised domain adaptation. Specifically, we formulate a novel regularised sparse coding framework which uses the target domain class labels' projections in the semantic space to regularise the learned target domain projection thus effectively overcoming the projection domain shift problem. Extensive experiments on four object and action recognition benchmark datasets show that the proposed ZSL method significantly outperforms the state-of-the-arts.
零射击学习(Zero-shot learning, ZSL)可以看作是迁移学习的一种特殊情况,源领域和目标领域具有不同的任务/标签空间,目标领域没有标签,对知识迁移的指导作用很小。ZSL方法通常假设两个域共享一个共同的语义表示空间,其中从图像/视频中提取的视觉特征向量可以使用投影函数进行投影/嵌入。现有的方法是从源域学习投影函数,并将其应用于目标域,而不适应目标域。因此,它们是基于朴素知识迁移的,学习到的预测容易出现领域转移问题。本文提出了一种基于无监督域自适应的ZSL方法。具体而言,我们提出了一种新的正则化稀疏编码框架,该框架利用目标域类标签在语义空间中的投影对学习到的目标域投影进行正则化,从而有效地克服了投影域移位问题。在四个目标和动作识别基准数据集上进行的大量实验表明,所提出的ZSL方法明显优于目前最先进的方法。
{"title":"Unsupervised Domain Adaptation for Zero-Shot Learning","authors":"Elyor Kodirov, T. Xiang, Zhenyong Fu, S. Gong","doi":"10.1109/ICCV.2015.282","DOIUrl":"https://doi.org/10.1109/ICCV.2015.282","url":null,"abstract":"Zero-shot learning (ZSL) can be considered as a special case of transfer learning where the source and target domains have different tasks/label spaces and the target domain is unlabelled, providing little guidance for the knowledge transfer. A ZSL method typically assumes that the two domains share a common semantic representation space, where a visual feature vector extracted from an image/video can be projected/embedded using a projection function. Existing approaches learn the projection function from the source domain and apply it without adaptation to the target domain. They are thus based on naive knowledge transfer and the learned projections are prone to the domain shift problem. In this paper a novel ZSL method is proposed based on unsupervised domain adaptation. Specifically, we formulate a novel regularised sparse coding framework which uses the target domain class labels' projections in the semantic space to regularise the learned target domain projection thus effectively overcoming the projection domain shift problem. Extensive experiments on four object and action recognition benchmark datasets show that the proposed ZSL method significantly outperforms the state-of-the-arts.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"43 1","pages":"2452-2460"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73580029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 375
Projection onto the Manifold of Elongated Structures for Accurate Extraction 投影到流形上的细长结构精确提取
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.44
A. Sironi, V. Lepetit, P. Fua
Detection of elongated structures in 2D images and 3D image stacks is a critical prerequisite in many applications and Machine Learning-based approaches have recently been shown to deliver superior performance. However, these methods essentially classify individual locations and do not explicitly model the strong relationship that exists between neighboring ones. As a result, isolated erroneous responses, discontinuities, and topological errors are present in the resulting score maps. We solve this problem by projecting patches of the score map to their nearest neighbors in a set of ground truth training patches. Our algorithm induces global spatial consistency on the classifier score map and returns results that are provably geometrically consistent. We apply our algorithm to challenging datasets in four different domains and show that it compares favorably to state-of-the-art methods.
在许多应用中,检测2D图像和3D图像堆栈中的细长结构是一个关键的先决条件,基于机器学习的方法最近被证明具有卓越的性能。然而,这些方法基本上是对单个位置进行分类,并没有明确地对相邻位置之间存在的强烈关系进行建模。因此,孤立的错误响应、不连续性和拓扑错误出现在结果的分数图中。我们通过在一组地面真值训练补丁中,将分数地图的补丁投影到它们最近的邻居来解决这个问题。我们的算法在分类器得分图上诱导全局空间一致性,并返回可证明的几何一致性结果。我们将我们的算法应用于四个不同领域的挑战性数据集,并表明它优于最先进的方法。
{"title":"Projection onto the Manifold of Elongated Structures for Accurate Extraction","authors":"A. Sironi, V. Lepetit, P. Fua","doi":"10.1109/ICCV.2015.44","DOIUrl":"https://doi.org/10.1109/ICCV.2015.44","url":null,"abstract":"Detection of elongated structures in 2D images and 3D image stacks is a critical prerequisite in many applications and Machine Learning-based approaches have recently been shown to deliver superior performance. However, these methods essentially classify individual locations and do not explicitly model the strong relationship that exists between neighboring ones. As a result, isolated erroneous responses, discontinuities, and topological errors are present in the resulting score maps. We solve this problem by projecting patches of the score map to their nearest neighbors in a set of ground truth training patches. Our algorithm induces global spatial consistency on the classifier score map and returns results that are provably geometrically consistent. We apply our algorithm to challenging datasets in four different domains and show that it compares favorably to state-of-the-art methods.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"20 1","pages":"316-324"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74523830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Single Image Pop-Up from Discriminatively Learned Parts 单个图像弹出从判别学习部分
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.112
Menglong Zhu, Xiaowei Zhou, Kostas Daniilidis
We introduce a new approach for estimating a fine grained 3D shape and continuous pose of an object from a single image. Given a training set of view exemplars, we learn and select appearance-based discriminative parts which are mapped onto the 3D model through a facility location optimization. The training set of 3D models is summarized into a set of basis shapes from which we can generalize by linear combination. Given a test image, we detect hypotheses for each part. The main challenge is to select from these hypotheses and compute the 3D pose and shape coefficients at the same time. To achieve this, we optimize a function that considers simultaneously the appearance matching of the parts as well as the geometric reprojection error. We apply the alternating direction method of multipliers (ADMM) to minimize the resulting convex function. Our main and novel contribution is the simultaneous solution for part localization and detailed 3D geometry estimation by maximizing both appearance and geometric compatibility with convex relaxation.
我们介绍了一种从单幅图像中估计物体的细粒度三维形状和连续姿态的新方法。给定一个视图示例训练集,我们学习并选择基于外观的判别部件,这些部件通过设施位置优化映射到3D模型上。将三维模型的训练集归纳为一组基形状,通过线性组合进行归纳。给定一个测试图像,我们检测每个部分的假设。主要的挑战是从这些假设中进行选择,同时计算三维姿态和形状系数。为了实现这一点,我们优化了一个函数,该函数同时考虑了零件的外观匹配以及几何重投影误差。我们应用乘法器的交替方向法(ADMM)来最小化所得到的凸函数。我们的主要和新颖的贡献是通过最大化外观和凸松弛的几何兼容性来同时解决零件定位和详细的3D几何估计。
{"title":"Single Image Pop-Up from Discriminatively Learned Parts","authors":"Menglong Zhu, Xiaowei Zhou, Kostas Daniilidis","doi":"10.1109/ICCV.2015.112","DOIUrl":"https://doi.org/10.1109/ICCV.2015.112","url":null,"abstract":"We introduce a new approach for estimating a fine grained 3D shape and continuous pose of an object from a single image. Given a training set of view exemplars, we learn and select appearance-based discriminative parts which are mapped onto the 3D model through a facility location optimization. The training set of 3D models is summarized into a set of basis shapes from which we can generalize by linear combination. Given a test image, we detect hypotheses for each part. The main challenge is to select from these hypotheses and compute the 3D pose and shape coefficients at the same time. To achieve this, we optimize a function that considers simultaneously the appearance matching of the parts as well as the geometric reprojection error. We apply the alternating direction method of multipliers (ADMM) to minimize the resulting convex function. Our main and novel contribution is the simultaneous solution for part localization and detailed 3D geometry estimation by maximizing both appearance and geometric compatibility with convex relaxation.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"33 1","pages":"927-935"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72712417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Learning Deep Representation with Large-Scale Attributes 学习大规模属性的深度表示
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.220
Wanli Ouyang, Hongyang Li, Xingyu Zeng, Xiaogang Wang
Learning strong feature representations from large scale supervision has achieved remarkable success in computer vision as the emergence of deep learning techniques. It is driven by big visual data with rich annotations. This paper contributes a large-scale object attribute database that contains rich attribute annotations (over 300 attributes) for ~180k samples and 494 object classes. Based on the ImageNet object detection dataset, it annotates the rotation, viewpoint, object part location, part occlusion, part existence, common attributes, and class-specific attributes. Then we use this dataset to train deep representations and extensively evaluate how these attributes are useful on the general object detection task. In order to make better use of the attribute annotations, a deep learning scheme is proposed by modeling the relationship of attributes and hierarchically clustering them into semantically meaningful mixture types. Experimental results show that the attributes are helpful in learning better features and improving the object detection accuracy by 2.6% in mAP on the ILSVRC 2014 object detection dataset and 2.4% in mAP on PASCAL VOC 2007 object detection dataset. Such improvement is well generalized across datasets.
随着深度学习技术的出现,从大规模监督中学习强特征表示在计算机视觉领域取得了显著的成功。它由具有丰富注释的大可视化数据驱动。本文构建了一个大型对象属性数据库,该数据库包含约180k个样本和494个对象类的丰富属性注释(超过300个属性)。基于ImageNet对象检测数据集,对旋转、视点、对象部分位置、部分遮挡、部分存在、公共属性和类特定属性进行标注。然后,我们使用该数据集来训练深度表征,并广泛评估这些属性在一般目标检测任务中的用处。为了更好地利用属性标注,提出了一种深度学习方案,对属性之间的关系进行建模,并将其分层聚类为语义上有意义的混合类型。实验结果表明,这些属性有助于学习更好的特征,在ILSVRC 2014目标检测数据集上,mAP的目标检测准确率提高2.6%,在PASCAL VOC 2007目标检测数据集上,mAP的目标检测准确率提高2.4%。这种改进可以很好地推广到各个数据集。
{"title":"Learning Deep Representation with Large-Scale Attributes","authors":"Wanli Ouyang, Hongyang Li, Xingyu Zeng, Xiaogang Wang","doi":"10.1109/ICCV.2015.220","DOIUrl":"https://doi.org/10.1109/ICCV.2015.220","url":null,"abstract":"Learning strong feature representations from large scale supervision has achieved remarkable success in computer vision as the emergence of deep learning techniques. It is driven by big visual data with rich annotations. This paper contributes a large-scale object attribute database that contains rich attribute annotations (over 300 attributes) for ~180k samples and 494 object classes. Based on the ImageNet object detection dataset, it annotates the rotation, viewpoint, object part location, part occlusion, part existence, common attributes, and class-specific attributes. Then we use this dataset to train deep representations and extensively evaluate how these attributes are useful on the general object detection task. In order to make better use of the attribute annotations, a deep learning scheme is proposed by modeling the relationship of attributes and hierarchically clustering them into semantically meaningful mixture types. Experimental results show that the attributes are helpful in learning better features and improving the object detection accuracy by 2.6% in mAP on the ILSVRC 2014 object detection dataset and 2.4% in mAP on PASCAL VOC 2007 object detection dataset. Such improvement is well generalized across datasets.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"42 1","pages":"1895-1903"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75493703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Airborne Three-Dimensional Cloud Tomography 航空三维云层析成像
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.386
Aviad Levis, Y. Schechner, Amit Aides, A. Davis
We seek to sense the three dimensional (3D) volumetric distribution of scatterers in a heterogenous medium. An important case study for such a medium is the atmosphere. Atmospheric contents and their role in Earth's radiation balance have significant uncertainties with regards to scattering components: aerosols and clouds. Clouds, made of water droplets, also lead to local effects as precipitation and shadows. Our sensing approach is computational tomography using passive multi-angular imagery. For light-matter interaction that accounts for multiple-scattering, we use the 3D radiative transfer equation as a forward model. Volumetric recovery by inverting this model suffers from a computational bottleneck on large scales, which include many unknowns. Steps taken make this tomography tractable, without approximating the scattering order or angle range.
我们试图在非均质介质中感知散射体的三维(3D)体积分布。研究这种介质的一个重要案例是大气。大气含量及其在地球辐射平衡中的作用在散射成分:气溶胶和云方面具有很大的不确定性。由水滴构成的云,也会导致降水和阴影等局部效应。我们的传感方法是使用被动多角度图像的计算机断层扫描。对于考虑多重散射的光-物质相互作用,我们使用三维辐射传递方程作为正演模型。在包含许多未知因素的大尺度下,通过反演该模型进行体积恢复存在计算瓶颈。所采取的步骤使该层析成像易于处理,而无需近似散射顺序或角度范围。
{"title":"Airborne Three-Dimensional Cloud Tomography","authors":"Aviad Levis, Y. Schechner, Amit Aides, A. Davis","doi":"10.1109/ICCV.2015.386","DOIUrl":"https://doi.org/10.1109/ICCV.2015.386","url":null,"abstract":"We seek to sense the three dimensional (3D) volumetric distribution of scatterers in a heterogenous medium. An important case study for such a medium is the atmosphere. Atmospheric contents and their role in Earth's radiation balance have significant uncertainties with regards to scattering components: aerosols and clouds. Clouds, made of water droplets, also lead to local effects as precipitation and shadows. Our sensing approach is computational tomography using passive multi-angular imagery. For light-matter interaction that accounts for multiple-scattering, we use the 3D radiative transfer equation as a forward model. Volumetric recovery by inverting this model suffers from a computational bottleneck on large scales, which include many unknowns. Steps taken make this tomography tractable, without approximating the scattering order or angle range.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"22 1","pages":"3379-3387"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81162086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
期刊
2015 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1