首页 > 最新文献

2013 IEEE International Conference on Computer Vision最新文献

英文 中文
Coherent Object Detection with 3D Geometric Context from a Single Image 基于单幅图像的三维几何背景的相干目标检测
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.320
Jiyan Pan, T. Kanade
Objects in a real world image cannot have arbitrary appearance, sizes and locations due to geometric constraints in 3D space. Such a 3D geometric context plays an important role in resolving visual ambiguities and achieving coherent object detection. In this paper, we develop a RANSAC-CRF framework to detect objects that are geometrically coherent in the 3D world. Different from existing methods, we propose a novel generalized RANSAC algorithm to generate global 3D geometry hypotheses from local entities such that outlier suppression and noise reduction is achieved simultaneously. In addition, we evaluate those hypotheses using a CRF which considers both the compatibility of individual objects under global 3D geometric context and the compatibility between adjacent objects under local 3D geometric context. Experiment results show that our approach compares favorably with the state of the art.
由于三维空间的几何约束,现实世界图像中的物体不能具有任意的外观、大小和位置。这种三维几何环境对于解决视觉歧义和实现目标的相干检测具有重要作用。在本文中,我们开发了一个RANSAC-CRF框架来检测三维世界中几何相干的物体。与现有方法不同,我们提出了一种新的广义RANSAC算法,从局部实体生成全局三维几何假设,从而同时实现离群值抑制和降噪。此外,我们使用CRF来评估这些假设,该CRF考虑了全局三维几何环境下单个物体的兼容性和局部三维几何环境下相邻物体之间的兼容性。实验结果表明,我们的方法优于目前的技术水平。
{"title":"Coherent Object Detection with 3D Geometric Context from a Single Image","authors":"Jiyan Pan, T. Kanade","doi":"10.1109/ICCV.2013.320","DOIUrl":"https://doi.org/10.1109/ICCV.2013.320","url":null,"abstract":"Objects in a real world image cannot have arbitrary appearance, sizes and locations due to geometric constraints in 3D space. Such a 3D geometric context plays an important role in resolving visual ambiguities and achieving coherent object detection. In this paper, we develop a RANSAC-CRF framework to detect objects that are geometrically coherent in the 3D world. Different from existing methods, we propose a novel generalized RANSAC algorithm to generate global 3D geometry hypotheses from local entities such that outlier suppression and noise reduction is achieved simultaneously. In addition, we evaluate those hypotheses using a CRF which considers both the compatibility of individual objects under global 3D geometric context and the compatibility between adjacent objects under local 3D geometric context. Experiment results show that our approach compares favorably with the state of the art.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83937496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Robust Tucker Tensor Decomposition for Effective Image Representation 鲁棒塔克张量分解的有效图像表示
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.304
Miao Zhang, C. Ding
Many tensor based algorithms have been proposed for the study of high dimensional data in a large variety of computer vision and machine learning applications. However, most of the existing tensor analysis approaches are based on Frobenius norm, which makes them sensitive to outliers, because they minimize the sum of squared errors and enlarge the influence of both outliers and large feature noises. In this paper, we propose a robust Tucker tensor decomposition model (RTD) to suppress the influence of outliers, which uses L1-norm loss function. Yet, the optimization on L1-norm based tensor analysis is much harder than standard tensor decomposition. In this paper, we propose a simple and efficient algorithm to solve our RTD model. Moreover, tensor factorization-based image storage needs much less space than PCA based methods. We carry out extensive experiments to evaluate the proposed algorithm, and verify the robustness against image occlusions. Both numerical and visual results show that our RTD model is consistently better against the existence of outliers than previous tensor and PCA methods.
在各种各样的计算机视觉和机器学习应用中,已经提出了许多基于张量的算法来研究高维数据。然而,现有的张量分析方法大多是基于Frobenius范数的,由于它们将误差的平方和最小化,放大了异常点和大特征噪声的影响,因此对异常点比较敏感。本文提出了一种鲁棒Tucker张量分解模型(RTD),该模型使用l1范数损失函数来抑制异常值的影响。然而,基于l1范数的张量分析的优化比标准张量分解要困难得多。在本文中,我们提出了一种简单有效的算法来求解我们的RTD模型。此外,基于张量分解的图像存储比基于PCA的方法需要更少的空间。我们进行了大量的实验来评估所提出的算法,并验证了对图像遮挡的鲁棒性。数值和视觉结果都表明,我们的RTD模型对异常值的存在始终优于以往的张量和主成分分析方法。
{"title":"Robust Tucker Tensor Decomposition for Effective Image Representation","authors":"Miao Zhang, C. Ding","doi":"10.1109/ICCV.2013.304","DOIUrl":"https://doi.org/10.1109/ICCV.2013.304","url":null,"abstract":"Many tensor based algorithms have been proposed for the study of high dimensional data in a large variety of computer vision and machine learning applications. However, most of the existing tensor analysis approaches are based on Frobenius norm, which makes them sensitive to outliers, because they minimize the sum of squared errors and enlarge the influence of both outliers and large feature noises. In this paper, we propose a robust Tucker tensor decomposition model (RTD) to suppress the influence of outliers, which uses L1-norm loss function. Yet, the optimization on L1-norm based tensor analysis is much harder than standard tensor decomposition. In this paper, we propose a simple and efficient algorithm to solve our RTD model. Moreover, tensor factorization-based image storage needs much less space than PCA based methods. We carry out extensive experiments to evaluate the proposed algorithm, and verify the robustness against image occlusions. Both numerical and visual results show that our RTD model is consistently better against the existence of outliers than previous tensor and PCA methods.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86216896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Visual Reranking through Weakly Supervised Multi-graph Learning 基于弱监督多图学习的视觉重排序
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.323
Cheng Deng, R. Ji, W. Liu, D. Tao, Xinbo Gao
Visual reranking has been widely deployed to refine the quality of conventional content-based image retrieval engines. The current trend lies in employing a crowd of retrieved results stemming from multiple feature modalities to boost the overall performance of visual reranking. However, a major challenge pertaining to current reranking methods is how to take full advantage of the complementary property of distinct feature modalities. Given a query image and one feature modality, a regular visual reranking framework treats the top-ranked images as pseudo positive instances which are inevitably noisy, difficult to reveal this complementary property, and thus lead to inferior ranking performance. This paper proposes a novel image reranking approach by introducing a Co-Regularized Multi-Graph Learning (Co-RMGL) framework, in which the intra-graph and inter-graph constraints are simultaneously imposed to encode affinities in a single graph and consistency across different graphs. Moreover, weakly supervised learning driven by image attributes is performed to denoise the pseudo-labeled instances, thereby highlighting the unique strength of individual feature modality. Meanwhile, such learning can yield a few anchors in graphs that vitally enable the alignment and fusion of multiple graphs. As a result, an edge weight matrix learned from the fused graph automatically gives the ordering to the initially retrieved results. We evaluate our approach on four benchmark image retrieval datasets, demonstrating a significant performance gain over the state-of-the-arts.
视觉重排序已被广泛应用于改进传统的基于内容的图像检索引擎的质量。目前的趋势是利用来自多种特征模态的大量检索结果来提高视觉重排序的整体性能。然而,当前的重排序方法面临的主要挑战是如何充分利用不同特征模态的互补性。给定一个查询图像和一个特征模态,常规的视觉重排序框架将排名靠前的图像视为伪正实例,这些伪正实例不可避免地存在噪声,难以揭示这种互补特性,从而导致排名性能较差。本文通过引入协同正则化多图学习(Co-RMGL)框架,提出了一种新的图像重排序方法,该框架同时施加图内约束和图间约束来编码单个图中的亲和力和不同图之间的一致性。此外,通过图像属性驱动的弱监督学习对伪标记实例进行去噪,从而突出单个特征模态的独特强度。同时,这种学习可以在图中产生一些锚点,这些锚点对多个图的对齐和融合至关重要。结果,从融合图中学习到的边权矩阵自动给出了初始检索结果的排序。我们在四个基准图像检索数据集上评估了我们的方法,证明了比最先进的性能有显著的提高。
{"title":"Visual Reranking through Weakly Supervised Multi-graph Learning","authors":"Cheng Deng, R. Ji, W. Liu, D. Tao, Xinbo Gao","doi":"10.1109/ICCV.2013.323","DOIUrl":"https://doi.org/10.1109/ICCV.2013.323","url":null,"abstract":"Visual reranking has been widely deployed to refine the quality of conventional content-based image retrieval engines. The current trend lies in employing a crowd of retrieved results stemming from multiple feature modalities to boost the overall performance of visual reranking. However, a major challenge pertaining to current reranking methods is how to take full advantage of the complementary property of distinct feature modalities. Given a query image and one feature modality, a regular visual reranking framework treats the top-ranked images as pseudo positive instances which are inevitably noisy, difficult to reveal this complementary property, and thus lead to inferior ranking performance. This paper proposes a novel image reranking approach by introducing a Co-Regularized Multi-Graph Learning (Co-RMGL) framework, in which the intra-graph and inter-graph constraints are simultaneously imposed to encode affinities in a single graph and consistency across different graphs. Moreover, weakly supervised learning driven by image attributes is performed to denoise the pseudo-labeled instances, thereby highlighting the unique strength of individual feature modality. Meanwhile, such learning can yield a few anchors in graphs that vitally enable the alignment and fusion of multiple graphs. As a result, an edge weight matrix learned from the fused graph automatically gives the ordering to the initially retrieved results. We evaluate our approach on four benchmark image retrieval datasets, demonstrating a significant performance gain over the state-of-the-arts.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77525433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Decomposing Bag of Words Histograms 分解词袋直方图
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.45
Ankit Gandhi, Alahari Karteek, C. V. Jawahar
We aim to decompose a global histogram representation of an image into histograms of its associated objects and regions. This task is formulated as an optimization problem, given a set of linear classifiers, which can effectively discriminate the object categories present in the image. Our decomposition bypasses harder problems associated with accurately localizing and segmenting objects. We evaluate our method on a wide variety of composite histograms, and also compare it with MRF-based solutions. In addition to merely measuring the accuracy of decomposition, we also show the utility of the estimated object and background histograms for the task of image classification on the PASCAL VOC 2007 dataset.
我们的目标是将图像的全局直方图表示分解为其相关对象和区域的直方图。该任务被表述为一个优化问题,给定一组线性分类器,可以有效地区分图像中存在的对象类别。我们的分解绕过了与精确定位和分割对象相关的更难的问题。我们在多种复合直方图上评估了我们的方法,并将其与基于磁共振成像的解决方案进行了比较。除了测量分解的准确性外,我们还展示了估计的对象和背景直方图在PASCAL VOC 2007数据集上的图像分类任务中的效用。
{"title":"Decomposing Bag of Words Histograms","authors":"Ankit Gandhi, Alahari Karteek, C. V. Jawahar","doi":"10.1109/ICCV.2013.45","DOIUrl":"https://doi.org/10.1109/ICCV.2013.45","url":null,"abstract":"We aim to decompose a global histogram representation of an image into histograms of its associated objects and regions. This task is formulated as an optimization problem, given a set of linear classifiers, which can effectively discriminate the object categories present in the image. Our decomposition bypasses harder problems associated with accurately localizing and segmenting objects. We evaluate our method on a wide variety of composite histograms, and also compare it with MRF-based solutions. In addition to merely measuring the accuracy of decomposition, we also show the utility of the estimated object and background histograms for the task of image classification on the PASCAL VOC 2007 dataset.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91529516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Drosophila Embryo Stage Annotation Using Label Propagation 标签繁殖技术对果蝇胚胎阶段的注释
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.139
T. Kazmar, E. Kvon, A. Stark, Christoph H. Lampert
In this work we propose a system for automatic classification of Drosophila embryos into developmental stages. While the system is designed to solve an actual problem in biological research, we believe that the principle underlying it is interesting not only for biologists, but also for researchers in computer vision. The main idea is to combine two orthogonal sources of information: one is a classifier trained on strongly invariant features, which makes it applicable to images of very different conditions, but also leads to rather noisy predictions. The other is a label propagation step based on a more powerful similarity measure that however is only consistent within specific subsets of the data at a time. In our biological setup, the information sources are the shape and the staining patterns of embryo images. We show experimentally that while neither of the methods can be used by itself to achieve satisfactory results, their combination achieves prediction quality comparable to human performance.
在这项工作中,我们提出了一个系统的自动分类果蝇胚胎的发育阶段。虽然该系统旨在解决生物学研究中的实际问题,但我们相信,它背后的原理不仅对生物学家来说很有趣,而且对计算机视觉研究人员来说也很有趣。主要思想是结合两个正交的信息源:一个是在强不变特征上训练的分类器,这使得它适用于非常不同条件的图像,但也会导致相当嘈杂的预测。另一种是基于更强大的相似性度量的标签传播步骤,但是一次只能在特定的数据子集内保持一致。在我们的生物学设置中,信息源是胚胎图像的形状和染色模式。我们通过实验证明,虽然这两种方法都不能单独使用以获得令人满意的结果,但它们的组合可以实现与人类性能相当的预测质量。
{"title":"Drosophila Embryo Stage Annotation Using Label Propagation","authors":"T. Kazmar, E. Kvon, A. Stark, Christoph H. Lampert","doi":"10.1109/ICCV.2013.139","DOIUrl":"https://doi.org/10.1109/ICCV.2013.139","url":null,"abstract":"In this work we propose a system for automatic classification of Drosophila embryos into developmental stages. While the system is designed to solve an actual problem in biological research, we believe that the principle underlying it is interesting not only for biologists, but also for researchers in computer vision. The main idea is to combine two orthogonal sources of information: one is a classifier trained on strongly invariant features, which makes it applicable to images of very different conditions, but also leads to rather noisy predictions. The other is a label propagation step based on a more powerful similarity measure that however is only consistent within specific subsets of the data at a time. In our biological setup, the information sources are the shape and the staining patterns of embryo images. We show experimentally that while neither of the methods can be used by itself to achieve satisfactory results, their combination achieves prediction quality comparable to human performance.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91299244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Domain Adaptive Classification 领域自适应分类
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.324
Fatemeh Mirrashed, Mohammad Rastegari
We propose an unsupervised domain adaptation method that exploits intrinsic compact structures of categories across different domains using binary attributes. Our method directly optimizes for classification in the target domain. The key insight is finding attributes that are discriminative across categories and predictable across domains. We achieve a performance that significantly exceeds the state-of-the-art results on standard benchmarks. In fact, in many cases, our method reaches the same-domain performance, the upper bound, in unsupervised domain adaptation scenarios.
提出了一种利用二元属性在不同领域间利用类别的内在紧密结构的无监督领域自适应方法。我们的方法直接对目标域的分类进行优化。关键的洞察力是找到跨类别和跨领域可预测的属性。我们实现的性能大大超过了标准基准的最先进的结果。事实上,在许多情况下,我们的方法在无监督域自适应场景中达到了同域性能的上限。
{"title":"Domain Adaptive Classification","authors":"Fatemeh Mirrashed, Mohammad Rastegari","doi":"10.1109/ICCV.2013.324","DOIUrl":"https://doi.org/10.1109/ICCV.2013.324","url":null,"abstract":"We propose an unsupervised domain adaptation method that exploits intrinsic compact structures of categories across different domains using binary attributes. Our method directly optimizes for classification in the target domain. The key insight is finding attributes that are discriminative across categories and predictable across domains. We achieve a performance that significantly exceeds the state-of-the-art results on standard benchmarks. In fact, in many cases, our method reaches the same-domain performance, the upper bound, in unsupervised domain adaptation scenarios.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85701711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Robust Dictionary Learning by Error Source Decomposition 基于错误源分解的鲁棒字典学习
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.276
Zhuoyuan Chen, Ying Wu
Sparsity models have recently shown great promise in many vision tasks. Using a learned dictionary in sparsity models can in general outperform predefined bases in clean data. In practice, both training and testing data may be corrupted and contain noises and outliers. Although recent studies attempted to cope with corrupted data and achieved encouraging results in testing phase, how to handle corruption in training phase still remains a very difficult problem. In contrast to most existing methods that learn the dictionary from clean data, this paper is targeted at handling corruptions and outliers in training data for dictionary learning. We propose a general method to decompose the reconstructive residual into two components: a non-sparse component for small universal noises and a sparse component for large outliers, respectively. In addition, further analysis reveals the connection between our approach and the ``partial'' dictionary learning approach, updating only part of the prototypes (or informative code words) with remaining (or noisy code words) fixed. Experiments on synthetic data as well as real applications have shown satisfactory performance of this new robust dictionary learning approach.
稀疏模型最近在许多视觉任务中显示出巨大的前景。在稀疏模型中使用学习过的字典通常可以优于干净数据中的预定义基。在实际应用中,训练数据和测试数据都可能被破坏,并且包含噪声和异常值。虽然近年来的研究试图处理损坏的数据,并在测试阶段取得了令人鼓舞的成果,但如何在训练阶段处理损坏仍然是一个非常困难的问题。与大多数现有的从干净数据中学习字典的方法相比,本文的目标是处理字典学习训练数据中的腐败和异常值。我们提出了一种将重构残差分解为两个分量的通用方法:小通用噪声的非稀疏分量和大异常值的稀疏分量。此外,进一步的分析揭示了我们的方法与“部分”字典学习方法之间的联系,该方法仅更新部分原型(或信息码字),其余(或噪声码字)固定。在合成数据和实际应用上的实验表明,这种新的鲁棒字典学习方法具有令人满意的性能。
{"title":"Robust Dictionary Learning by Error Source Decomposition","authors":"Zhuoyuan Chen, Ying Wu","doi":"10.1109/ICCV.2013.276","DOIUrl":"https://doi.org/10.1109/ICCV.2013.276","url":null,"abstract":"Sparsity models have recently shown great promise in many vision tasks. Using a learned dictionary in sparsity models can in general outperform predefined bases in clean data. In practice, both training and testing data may be corrupted and contain noises and outliers. Although recent studies attempted to cope with corrupted data and achieved encouraging results in testing phase, how to handle corruption in training phase still remains a very difficult problem. In contrast to most existing methods that learn the dictionary from clean data, this paper is targeted at handling corruptions and outliers in training data for dictionary learning. We propose a general method to decompose the reconstructive residual into two components: a non-sparse component for small universal noises and a sparse component for large outliers, respectively. In addition, further analysis reveals the connection between our approach and the ``partial'' dictionary learning approach, updating only part of the prototypes (or informative code words) with remaining (or noisy code words) fixed. Experiments on synthetic data as well as real applications have shown satisfactory performance of this new robust dictionary learning approach.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86056897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Manipulation Pattern Discovery: A Nonparametric Bayesian Approach 操作模式发现:非参数贝叶斯方法
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.172
Bingbing Ni, P. Moulin
We aim to unsupervisedly discover human's action (motion) patterns of manipulating various objects in scenarios such as assisted living. We are motivated by two key observations. First, large variation exists in motion patterns associated with various types of objects being manipulated, thus manually defining motion primitives is infeasible. Second, some motion patterns are shared among different objects being manipulated while others are object specific. We therefore propose a nonparametric Bayesian method that adopts a hierarchical Dirichlet process prior to learn representative manipulation (motion) patterns in an unsupervised manner. Taking easy-to-obtain object detection score maps and dense motion trajectories as inputs, the proposed probabilistic model can discover motion pattern groups associated with different types of objects being manipulated with a shared manipulation pattern dictionary. The size of the learned dictionary is automatically inferred. Comprehensive experiments on two assisted living benchmarks and a cooking motion dataset demonstrate superiority of our learned manipulation pattern dictionary in representing manipulation actions for recognition.
我们的目标是无监督地发现人类在辅助生活等场景中操纵各种物体的行动(运动)模式。我们的动机来自两个关键观察。首先,与被操作的各种类型的对象相关的运动模式存在很大的变化,因此手动定义运动原语是不可行的。其次,一些运动模式在被操纵的不同对象之间共享,而另一些则是对象特定的。因此,我们提出了一种非参数贝叶斯方法,该方法在以无监督的方式学习具有代表性的操作(运动)模式之前采用分层狄利克雷过程。该概率模型以易于获取的目标检测得分图和密集的运动轨迹为输入,通过共享的操作模式字典发现与不同类型的被操作对象相关的运动模式组。学习字典的大小会自动推断出来。在两个辅助生活基准和一个烹饪动作数据集上的综合实验证明了我们学习的操作模式字典在表示操作动作以进行识别方面的优势。
{"title":"Manipulation Pattern Discovery: A Nonparametric Bayesian Approach","authors":"Bingbing Ni, P. Moulin","doi":"10.1109/ICCV.2013.172","DOIUrl":"https://doi.org/10.1109/ICCV.2013.172","url":null,"abstract":"We aim to unsupervisedly discover human's action (motion) patterns of manipulating various objects in scenarios such as assisted living. We are motivated by two key observations. First, large variation exists in motion patterns associated with various types of objects being manipulated, thus manually defining motion primitives is infeasible. Second, some motion patterns are shared among different objects being manipulated while others are object specific. We therefore propose a nonparametric Bayesian method that adopts a hierarchical Dirichlet process prior to learn representative manipulation (motion) patterns in an unsupervised manner. Taking easy-to-obtain object detection score maps and dense motion trajectories as inputs, the proposed probabilistic model can discover motion pattern groups associated with different types of objects being manipulated with a shared manipulation pattern dictionary. The size of the learned dictionary is automatically inferred. Comprehensive experiments on two assisted living benchmarks and a cooking motion dataset demonstrate superiority of our learned manipulation pattern dictionary in representing manipulation actions for recognition.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81230042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model 基于多模态半监督学习模型的异构图像特征集成
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.218
Xiao Cai, F. Nie, Weidong (Tom) Cai, Heng Huang
Automatic image categorization has become increasingly important with the development of Internet and the growth in the size of image databases. Although the image categorization can be formulated as a typical multi-class classification problem, two major challenges have been raised by the real-world images. On one hand, though using more labeled training data may improve the prediction performance, obtaining the image labels is a time consuming as well as biased process. On the other hand, more and more visual descriptors have been proposed to describe objects and scenes appearing in images and different features describe different aspects of the visual characteristics. Therefore, how to integrate heterogeneous visual features to do the semi-supervised learning is crucial for categorizing large-scale image data. In this paper, we propose a novel approach to integrate heterogeneous features by performing multi-modal semi-supervised classification on unlabeled as well as unsegmented images. Considering each type of feature as one modality, taking advantage of the large amount of unlabeled data information, our new adaptive multi-modal semi-supervised classification (AMMSS) algorithm learns a commonly shared class indicator matrix and the weights for different modalities (image features) simultaneously.
随着互联网的发展和图像数据库规模的增长,自动图像分类变得越来越重要。虽然图像分类可以表述为一个典型的多类分类问题,但现实世界的图像提出了两个主要的挑战。一方面,虽然使用更多的标记训练数据可以提高预测性能,但获得图像标签是一个耗时且有偏见的过程。另一方面,人们提出了越来越多的视觉描述符来描述图像中出现的物体和场景,不同的特征描述了视觉特征的不同方面。因此,如何整合异构的视觉特征进行半监督学习是对大规模图像数据进行分类的关键。在本文中,我们提出了一种新的方法,通过对未标记和未分割的图像进行多模态半监督分类来整合异构特征。本文提出的自适应多模态半监督分类(AMMSS)算法将每一种特征作为一种模态,利用大量未标记的数据信息,同时学习一个共同的类指标矩阵和不同模态(图像特征)的权值。
{"title":"Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model","authors":"Xiao Cai, F. Nie, Weidong (Tom) Cai, Heng Huang","doi":"10.1109/ICCV.2013.218","DOIUrl":"https://doi.org/10.1109/ICCV.2013.218","url":null,"abstract":"Automatic image categorization has become increasingly important with the development of Internet and the growth in the size of image databases. Although the image categorization can be formulated as a typical multi-class classification problem, two major challenges have been raised by the real-world images. On one hand, though using more labeled training data may improve the prediction performance, obtaining the image labels is a time consuming as well as biased process. On the other hand, more and more visual descriptors have been proposed to describe objects and scenes appearing in images and different features describe different aspects of the visual characteristics. Therefore, how to integrate heterogeneous visual features to do the semi-supervised learning is crucial for categorizing large-scale image data. In this paper, we propose a novel approach to integrate heterogeneous features by performing multi-modal semi-supervised classification on unlabeled as well as unsegmented images. Considering each type of feature as one modality, taking advantage of the large amount of unlabeled data information, our new adaptive multi-modal semi-supervised classification (AMMSS) algorithm learns a commonly shared class indicator matrix and the weights for different modalities (image features) simultaneously.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85104914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 114
Modifying the Memorability of Face Photographs 修改面部照片的可记忆性
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.397
A. Khosla, Wilma A. Bainbridge, A. Torralba, A. Oliva
Contemporary life bombards us with many new images of faces every day, which poses non-trivial constraints on human memory. The vast majority of face photographs are intended to be remembered, either because of personal relevance, commercial interests or because the pictures were deliberately designed to be memorable. Can we make a portrait more memorable or more forgettable automatically? Here, we provide a method to modify the memorability of individual face photographs, while keeping the identity and other facial traits (e.g. age, attractiveness, and emotional magnitude) of the individual fixed. We show that face photographs manipulated to be more memorable (or more forgettable) are indeed more often remembered (or forgotten) in a crowd-sourcing experiment with an accuracy of 74%. Quantifying and modifying the 'memorability' of a face lends itself to many useful applications in computer vision and graphics, such as mnemonic aids for learning, photo editing applications for social networks and tools for designing memorable advertisements.
现代生活每天都以许多新的面孔图像轰炸我们,这对人类的记忆构成了不小的限制。绝大多数的面部照片都是为了让人记住,要么是因为个人相关,要么是因为商业利益,要么是因为这些照片被故意设计成令人难忘的。我们能自动让一幅肖像更难忘或更容易忘记吗?在这里,我们提供了一种方法来修改个人面部照片的可记忆性,同时保持个人的身份和其他面部特征(如年龄,吸引力和情绪大小)固定。在一项众包实验中,我们以74%的准确率表明,经过处理的人脸照片更容易被记住(或遗忘),确实更容易被记住(或遗忘)。量化和修改面部的“可记忆性”在计算机视觉和图形学中有许多有用的应用,例如学习的助记工具,社交网络的照片编辑应用程序和设计令人难忘的广告的工具。
{"title":"Modifying the Memorability of Face Photographs","authors":"A. Khosla, Wilma A. Bainbridge, A. Torralba, A. Oliva","doi":"10.1109/ICCV.2013.397","DOIUrl":"https://doi.org/10.1109/ICCV.2013.397","url":null,"abstract":"Contemporary life bombards us with many new images of faces every day, which poses non-trivial constraints on human memory. The vast majority of face photographs are intended to be remembered, either because of personal relevance, commercial interests or because the pictures were deliberately designed to be memorable. Can we make a portrait more memorable or more forgettable automatically? Here, we provide a method to modify the memorability of individual face photographs, while keeping the identity and other facial traits (e.g. age, attractiveness, and emotional magnitude) of the individual fixed. We show that face photographs manipulated to be more memorable (or more forgettable) are indeed more often remembered (or forgotten) in a crowd-sourcing experiment with an accuracy of 74%. Quantifying and modifying the 'memorability' of a face lends itself to many useful applications in computer vision and graphics, such as mnemonic aids for learning, photo editing applications for social networks and tools for designing memorable advertisements.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90924104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 107
期刊
2013 IEEE International Conference on Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1