2015 IEEE International Conference on Computer Vision (ICCV)最新文献

英文中文

Convex Optimization with Abstract Linear Operators 抽象线性算子的凸优化

2015 IEEE International Conference on Computer Vision (ICCV)

Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.84

Steven Diamond, Stephen P. Boyd

We introduce a convex optimization modeling framework that transforms a convex optimization problem expressed in a form natural and convenient for the user into an equivalent cone program in a way that preserves fast linear transforms in the original problem. By representing linear functions in the transformation process not as matrices, but as graphs that encode composition of abstract linear operators, we arrive at a matrix-free cone program, i.e., one whose data matrix is represented by an abstract linear operator and its adjoint. This cone program can then be solved by a matrix-free cone solver. By combining the matrix-free modeling framework and cone solver, we obtain a general method for efficiently solving convex optimization problems involving fast linear transforms.

我们引入了一个凸优化建模框架，该框架将一个以用户自然方便的形式表示的凸优化问题转化为一个等价的锥规划，同时保留了原问题的快速线性变换。通过将变换过程中的线性函数不表示为矩阵，而是表示为编码抽象线性算子组合的图，我们得到了一个无矩阵锥规划，即其数据矩阵由抽象线性算子及其伴随算子表示。这个锥体程序可以用无矩阵锥体求解器来求解。将无矩阵建模框架与锥求解器相结合，得到了一种求解快速线性变换凸优化问题的通用方法。

引用次数: 24

Top Rank Supervised Binary Coding for Visual Search 视觉搜索的高秩监督二进制编码

2015 IEEE International Conference on Computer Vision (ICCV)

Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.223

Dongjin Song, W. Liu, R. Ji, David A. Meyer, John R. Smith

In recent years, binary coding techniques are becoming increasingly popular because of their high efficiency in handling large-scale computer vision applications. It has been demonstrated that supervised binary coding techniques that leverage supervised information can significantly enhance the coding quality, and hence greatly benefit visual search tasks. Typically, a modern binary coding method seeks to learn a group of coding functions which compress data samples into binary codes. However, few methods pursued the coding functions such that the precision at the top of a ranking list according to Hamming distances of the generated binary codes is optimized. In this paper, we propose a novel supervised binary coding approach, namely Top Rank Supervised Binary Coding (Top-RSBC), which explicitly focuses on optimizing the precision of top positions in a Hamming-distance ranking list towards preserving the supervision information. The core idea is to train the disciplined coding functions, by which the mistakes at the top of a Hamming-distance ranking list are penalized more than those at the bottom. To solve such coding functions, we relax the original discrete optimization objective with a continuous surrogate, and derive a stochastic gradient descent to optimize the surrogate objective. To further reduce the training time cost, we also design an online learning algorithm to optimize the surrogate objective more efficiently. Empirical studies based upon three benchmark image datasets demonstrate that the proposed binary coding approach achieves superior image search accuracy over the state-of-the-arts.

近年来，二进制编码技术因其在处理大规模计算机视觉应用方面的高效率而越来越受欢迎。已有研究表明，利用监督信息的监督二进制编码技术可以显著提高编码质量，从而大大有利于视觉搜索任务。典型地，现代二进制编码方法寻求学习一组编码函数，将数据样本压缩成二进制代码。然而，很少有方法追求编码函数，从而根据生成的二进制码的汉明距离优化排序表顶端的精度。在本文中，我们提出了一种新的监督二进制编码方法，即Top- Rank监督二进制编码(Top- rsbc)，它明确地侧重于优化汉明距离排序表中顶部位置的精度，以保持监督信息。其核心思想是训练有纪律的编码函数，通过这种方法，在汉明距离排名表中排名靠前的错误比排名靠后的错误受到更多的惩罚。为了求解这类编码函数，我们将原始的离散优化目标松弛为一个连续的代理，并推导出一个随机梯度下降法来优化代理目标。为了进一步降低训练时间成本，我们还设计了一种在线学习算法来更有效地优化代理目标。基于三个基准图像数据集的实证研究表明，所提出的二进制编码方法比目前最先进的图像搜索方法具有更高的图像搜索精度。

{"title":"Top Rank Supervised Binary Coding for Visual Search","authors":"Dongjin Song, W. Liu, R. Ji, David A. Meyer, John R. Smith","doi":"10.1109/ICCV.2015.223","DOIUrl":"https://doi.org/10.1109/ICCV.2015.223","url":null,"abstract":"In recent years, binary coding techniques are becoming increasingly popular because of their high efficiency in handling large-scale computer vision applications. It has been demonstrated that supervised binary coding techniques that leverage supervised information can significantly enhance the coding quality, and hence greatly benefit visual search tasks. Typically, a modern binary coding method seeks to learn a group of coding functions which compress data samples into binary codes. However, few methods pursued the coding functions such that the precision at the top of a ranking list according to Hamming distances of the generated binary codes is optimized. In this paper, we propose a novel supervised binary coding approach, namely Top Rank Supervised Binary Coding (Top-RSBC), which explicitly focuses on optimizing the precision of top positions in a Hamming-distance ranking list towards preserving the supervision information. The core idea is to train the disciplined coding functions, by which the mistakes at the top of a Hamming-distance ranking list are penalized more than those at the bottom. To solve such coding functions, we relax the original discrete optimization objective with a continuous surrogate, and derive a stochastic gradient descent to optimize the surrogate objective. To further reduce the training time cost, we also design an online learning algorithm to optimize the surrogate objective more efficiently. Empirical studies based upon three benchmark image datasets demonstrate that the proposed binary coding approach achieves superior image search accuracy over the state-of-the-arts.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"12 1","pages":"1922-1930"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90671401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 70

Semantic Component Analysis 语义成分分析

2015 IEEE International Conference on Computer Vision (ICCV)

Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.174

Calvin Murdock, F. D. L. Torre

Unsupervised and weakly-supervised visual learning in large image collections are critical in order to avoid the time-consuming and error-prone process of manual labeling. Standard approaches rely on methods like multiple-instance learning or graphical models, which can be computationally intensive and sensitive to initialization. On the other hand, simpler component analysis or clustering methods usually cannot achieve meaningful invariances or semantic interpretability. To address the issues of previous work, we present a simple but effective method called Semantic Component Analysis (SCA), which provides a decomposition of images into semantic components. Unsupervised SCA decomposes additive image representations into spatially-meaningful visual components that naturally correspond to object categories. Using an overcomplete representation that allows for rich instance-level constraints and spatial priors, SCA gives improved results and more interpretable components in comparison to traditional matrix factorization techniques. If weakly-supervised information is available in the form of image-level tags, SCA factorizes a set of images into semantic groups of superpixels. We also provide qualitative connections to traditional methods for component analysis (e.g. Grassmann averages, PCA, and NMF). The effectiveness of our approach is validated through synthetic data and on the MSRC2 and Sift Flow datasets, demonstrating competitive results in unsupervised and weakly-supervised semantic segmentation.

在大型图像集合中，无监督和弱监督视觉学习对于避免耗时且容易出错的人工标记过程至关重要。标准方法依赖于多实例学习或图形模型等方法，这些方法可能是计算密集型的，并且对初始化很敏感。另一方面，简单的组件分析或聚类方法通常无法实现有意义的不变性或语义可解释性。为了解决以前工作中的问题，我们提出了一种简单而有效的方法，称为语义成分分析(SCA)，它将图像分解为语义成分。无监督SCA将附加图像表示分解为自然对应于对象类别的具有空间意义的视觉组件。与传统的矩阵分解技术相比，SCA使用一种允许丰富的实例级约束和空间先验的过完备表示，提供了改进的结果和更多可解释的组件。如果以图像级标记的形式提供弱监督信息，SCA将一组图像分解为超像素的语义组。我们还提供了与传统成分分析方法(例如Grassmann平均、PCA和NMF)的定性联系。通过合成数据以及MSRC2和Sift Flow数据集验证了我们方法的有效性，展示了在无监督和弱监督语义分割方面的竞争结果。

{"title":"Semantic Component Analysis","authors":"Calvin Murdock, F. D. L. Torre","doi":"10.1109/ICCV.2015.174","DOIUrl":"https://doi.org/10.1109/ICCV.2015.174","url":null,"abstract":"Unsupervised and weakly-supervised visual learning in large image collections are critical in order to avoid the time-consuming and error-prone process of manual labeling. Standard approaches rely on methods like multiple-instance learning or graphical models, which can be computationally intensive and sensitive to initialization. On the other hand, simpler component analysis or clustering methods usually cannot achieve meaningful invariances or semantic interpretability. To address the issues of previous work, we present a simple but effective method called Semantic Component Analysis (SCA), which provides a decomposition of images into semantic components. Unsupervised SCA decomposes additive image representations into spatially-meaningful visual components that naturally correspond to object categories. Using an overcomplete representation that allows for rich instance-level constraints and spatial priors, SCA gives improved results and more interpretable components in comparison to traditional matrix factorization techniques. If weakly-supervised information is available in the form of image-level tags, SCA factorizes a set of images into semantic groups of superpixels. We also provide qualitative connections to traditional methods for component analysis (e.g. Grassmann averages, PCA, and NMF). The effectiveness of our approach is validated through synthetic data and on the MSRC2 and Sift Flow datasets, demonstrating competitive results in unsupervised and weakly-supervised semantic segmentation.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"1484-1492"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89461337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Person Re-Identification Ranking Optimisation by Discriminant Context Information Analysis 基于判别上下文信息分析的人物再识别排序优化

2015 IEEE International Conference on Computer Vision (ICCV)

Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.154

Jorge García, N. Martinel, C. Micheloni, Alfredo Gardel Vicente

Person re-identification is an open and challenging problem in computer vision. Existing re-identification approaches focus on optimal methods for features matching (e.g., metric learning approaches) or study the inter-camera transformations of such features. These methods hardly ever pay attention to the problem of visual ambiguities shared between the first ranks. In this paper, we focus on such a problem and introduce an unsupervised ranking optimization approach based on discriminant context information analysis. The proposed approach refines a given initial ranking by removing the visual ambiguities common to first ranks. This is achieved by analyzing their content and context information. Extensive experiments on three publicly available benchmark datasets and different baseline methods have been conducted. Results demonstrate a remarkable improvement in the first positions of the ranking. Regardless of the selected dataset, state-of-the-art methods are strongly outperformed by our method.

人物再识别是计算机视觉领域的一个开放性和挑战性问题。现有的再识别方法侧重于特征匹配的最佳方法(例如度量学习方法)或研究这些特征的相机间转换。这些方法几乎没有注意到第一排之间共享的视觉歧义问题。本文针对这一问题，提出了一种基于判别上下文信息分析的无监督排序优化方法。该方法通过消除常见的视觉模糊性来改进给定的初始排序。这可以通过分析它们的内容和上下文信息来实现。在三个公开的基准数据集和不同的基线方法上进行了大量的实验。结果显示，排名靠前的国家有了显著的进步。无论选择的数据集是什么，最先进的方法都明显优于我们的方法。

引用次数: 98

On the Equivalence of Moving Entrance Pupil and Radial Distortion for Camera Calibration 摄像机标定中移动入口瞳孔与径向畸变的等价性研究

2015 IEEE International Conference on Computer Vision (ICCV)

Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.270

Avinash Kumar, N. Ahuja

Radial distortion for ordinary (non-fisheye) camera lenses has traditionally been modeled as an infinite series function of radial location of an image pixel from the image center. While there has been enough empirical evidence to show that such a model is accurate and sufficient for radial distortion calibration, there has not been much analysis on the geometric/physical understanding of radial distortion from a camera calibration perspective. In this paper, we show using a thick-lens imaging model, that the variation of entrance pupil location as a function of incident image ray angle is directly responsible for radial distortion in captured images. Thus, unlike as proposed in the current state-of-the-art in camera calibration, radial distortion and entrance pupil movement are equivalent and need not be modeled together. By modeling only entrance pupil motion instead of radial distortion, we achieve two main benefits, first, we obtain comparable if not better pixel re-projection error than traditional methods, second, and more importantly, we directly back-project a radially distorted image pixel along the true image ray which formed it. Using a thick-lens setting, we show that such a back-projection is more accurate than the two-step method of undistorting an image pixel and then back-projecting it. We have applied this calibration method to the problem of generative depth-from-focus using focal stack to get accurate depth estimates.

传统上，普通(非鱼眼)相机镜头的径向畸变被建模为图像像素距图像中心径向位置的无穷级数函数。虽然有足够的经验证据表明，这种模型是准确的，足以用于径向畸变校准，但从相机校准的角度对径向畸变的几何/物理理解的分析还不多。在本文中，我们使用厚透镜成像模型表明，入口瞳孔位置的变化作为入射图像射线角的函数是捕获图像径向畸变的直接原因。因此，与当前摄像机标定中提出的不同，径向畸变和入口瞳孔运动是等效的，不需要一起建模。通过只模拟入口瞳孔运动而不是径向畸变，我们获得了两个主要的好处，首先，我们获得了与传统方法相当的像素重投影误差，其次，更重要的是，我们直接沿着形成它的真实图像射线反投影径向畸变的图像像素。使用厚透镜设置，我们表明这种反向投影比两步方法更准确，即不扭曲图像像素，然后反向投影它。我们将这种校准方法应用于利用焦点叠加生成深度的问题，以获得准确的深度估计。

{"title":"On the Equivalence of Moving Entrance Pupil and Radial Distortion for Camera Calibration","authors":"Avinash Kumar, N. Ahuja","doi":"10.1109/ICCV.2015.270","DOIUrl":"https://doi.org/10.1109/ICCV.2015.270","url":null,"abstract":"Radial distortion for ordinary (non-fisheye) camera lenses has traditionally been modeled as an infinite series function of radial location of an image pixel from the image center. While there has been enough empirical evidence to show that such a model is accurate and sufficient for radial distortion calibration, there has not been much analysis on the geometric/physical understanding of radial distortion from a camera calibration perspective. In this paper, we show using a thick-lens imaging model, that the variation of entrance pupil location as a function of incident image ray angle is directly responsible for radial distortion in captured images. Thus, unlike as proposed in the current state-of-the-art in camera calibration, radial distortion and entrance pupil movement are equivalent and need not be modeled together. By modeling only entrance pupil motion instead of radial distortion, we achieve two main benefits, first, we obtain comparable if not better pixel re-projection error than traditional methods, second, and more importantly, we directly back-project a radially distorted image pixel along the true image ray which formed it. Using a thick-lens setting, we show that such a back-projection is more accurate than the two-step method of undistorting an image pixel and then back-projecting it. We have applied this calibration method to the problem of generative depth-from-focus using focal stack to get accurate depth estimates.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"255 1","pages":"2345-2353"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76999998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

External Patch Prior Guided Internal Clustering for Image Denoising 外部补丁先验引导内聚类图像去噪

2015 IEEE International Conference on Computer Vision (ICCV)

Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.76

Fei Chen, Lei Zhang, Huimin Yu

Natural image modeling plays a key role in many vision problems such as image denoising. Image priors are widely used to regularize the denoising process, which is an ill-posed inverse problem. One category of denoising methods exploit the priors (e.g., TV, sparsity) learned from external clean images to reconstruct the given noisy image, while another category of methods exploit the internal prior (e.g., self-similarity) to reconstruct the latent image. Though the internal prior based methods have achieved impressive denoising results, the improvement of visual quality will become very difficult with the increase of noise level. In this paper, we propose to exploit image external patch prior and internal self-similarity prior jointly, and develop an external patch prior guided internal clustering algorithm for image denoising. It is known that natural image patches form multiple subspaces. By utilizing Gaussian mixture models (GMMs) learning, image similar patches can be clustered and the subspaces can be learned. The learned GMMs from clean images are then used to guide the clustering of noisy-patches of the input noisy images, followed by a low-rank approximation process to estimate the latent subspace for image recovery. Numerical experiments show that the proposed method outperforms many state-of-the-art denoising algorithms such as BM3D and WNNM.

自然图像建模在图像去噪等许多视觉问题中起着关键作用。图像先验被广泛用于正则化去噪过程，这是一个不适定逆问题。一类去噪方法利用从外部干净图像中学习到的先验(如TV、稀疏性)来重建给定的噪声图像，而另一类方法利用内部先验(如自相似性)来重建潜在图像。虽然基于内部先验的方法已经取得了令人印象深刻的去噪效果，但随着噪声水平的提高，视觉质量的提高将变得非常困难。本文提出联合利用图像外部补丁先验和内部自相似先验，开发一种外部补丁先验引导的图像内部聚类去噪算法。众所周知，自然图像块会形成多个子空间。利用高斯混合模型(GMMs)学习，可以对图像相似块进行聚类并学习子空间。然后使用从干净图像中学习到的gmm来指导输入噪声图像的噪声斑块聚类，然后使用低秩逼近过程来估计图像恢复的潜在子空间。数值实验表明，该方法优于BM3D和WNNM等现有的去噪算法。

{"title":"External Patch Prior Guided Internal Clustering for Image Denoising","authors":"Fei Chen, Lei Zhang, Huimin Yu","doi":"10.1109/ICCV.2015.76","DOIUrl":"https://doi.org/10.1109/ICCV.2015.76","url":null,"abstract":"Natural image modeling plays a key role in many vision problems such as image denoising. Image priors are widely used to regularize the denoising process, which is an ill-posed inverse problem. One category of denoising methods exploit the priors (e.g., TV, sparsity) learned from external clean images to reconstruct the given noisy image, while another category of methods exploit the internal prior (e.g., self-similarity) to reconstruct the latent image. Though the internal prior based methods have achieved impressive denoising results, the improvement of visual quality will become very difficult with the increase of noise level. In this paper, we propose to exploit image external patch prior and internal self-similarity prior jointly, and develop an external patch prior guided internal clustering algorithm for image denoising. It is known that natural image patches form multiple subspaces. By utilizing Gaussian mixture models (GMMs) learning, image similar patches can be clustered and the subspaces can be learned. The learned GMMs from clean images are then used to guide the clustering of noisy-patches of the input noisy images, followed by a low-rank approximation process to estimate the latent subspace for image recovery. Numerical experiments show that the proposed method outperforms many state-of-the-art denoising algorithms such as BM3D and WNNM.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"87 1","pages":"603-611"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75192741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 145

Variational Depth Superresolution Using Example-Based Edge Representations 使用基于示例的边缘表示的变分深度超分辨率

2015 IEEE International Conference on Computer Vision (ICCV)

Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.66

David Ferstl, M. Rüther, H. Bischof

In this paper we propose a novel method for depth image superresolution which combines recent advances in example based upsampling with variational superresolution based on a known blur kernel. Most traditional depth superresolution approaches try to use additional high resolution intensity images as guidance for superresolution. In our method we learn a dictionary of edge priors from an external database of high and low resolution examples. In a novel variational sparse coding approach this dictionary is used to infer strong edge priors. Additionally to the traditional sparse coding constraints the difference in the overlap of neighboring edge patches is minimized in our optimization. These edge priors are used in a novel variational superresolution as anisotropic guidance of the higher order regularization. Both the sparse coding and the variational superresolution of the depth are solved based on a primal-dual formulation. In an exhaustive numerical and visual evaluation we show that our method clearly outperforms existing approaches on multiple real and synthetic datasets.

本文提出了一种新的深度图像超分辨率方法，该方法结合了基于样例的上采样和基于已知模糊核的变分超分辨率的最新进展。大多数传统的深度超分辨率方法都试图使用额外的高分辨率强度图像作为超分辨率的指导。在我们的方法中，我们从高分辨率和低分辨率的外部数据库中学习边缘先验字典。在一种新的变分稀疏编码方法中，使用该字典来推断强边缘先验。除了传统的稀疏编码约束外，我们的优化还最小化了相邻边缘补丁重叠的差异。在一种新的变分超分辨中，这些边缘先验被用作高阶正则化的各向异性引导。稀疏编码和深度的变分超分辨率都是基于原始对偶公式求解的。在详尽的数值和视觉评估中，我们表明我们的方法在多个真实和合成数据集上明显优于现有方法。

引用次数: 85

Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection 迈向计算婴儿学习:一种弱监督的目标检测方法

2015 IEEE International Conference on Computer Vision (ICCV)

Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.120

Xiaodan Liang, Si Liu, Yunchao Wei, Luoqi Liu, Liang Lin, Shuicheng Yan

Intuitive observations show that a baby may inherently possess the capability of recognizing a new visual concept (e.g., chair, dog) by learning from only very few positive instances taught by parent(s) or others, and this recognition capability can be gradually further improved by exploring and/or interacting with the real instances in the physical world. Inspired by these observations, we propose a computational model for weakly-supervised object detection, based on prior knowledge modelling, exemplar learning and learning with video contexts. The prior knowledge is modeled with a pre-trained Convolutional Neural Network (CNN). When very few instances of a new concept are given, an initial concept detector is built by exemplar learning over the deep features the pre-trained CNN. The well-designed tracking solution is then used to discover more diverse instances from the massive online weakly labeled videos. Once a positive instance is detected/identified with high score in each video, more instances possibly from different view-angles and/or different distances are tracked and accumulated. Then the concept detector can be fine-tuned based on these new instances. This process can be repeated again and again till we obtain a very mature concept detector. Extensive experiments on Pascal VOC-07/10/12 object detection datasets [9] well demonstrate the effectiveness of our framework. It can beat the state-of-the-art full-training based performances by learning from very few samples for each object category, along with about 20,000 weakly labeled videos.

直观的观察表明，婴儿可能天生具有识别新的视觉概念(例如，椅子，狗)的能力，只需要从父母或其他人教给他们的很少的积极实例中学习，这种识别能力可以通过探索和/或与物理世界中的真实实例互动来逐步进一步提高。受这些观察结果的启发，我们提出了一种基于先验知识建模、范例学习和视频上下文学习的弱监督目标检测计算模型。先验知识用预训练的卷积神经网络(CNN)建模。当给出的新概念实例很少时，通过对预训练CNN的深度特征进行样例学习来构建初始概念检测器。然后使用设计良好的跟踪解决方案从大量在线弱标签视频中发现更多不同的实例。一旦在每个视频中检测到/识别出高分的阳性实例，就会跟踪和积累更多可能来自不同视角和/或不同距离的实例。然后，概念检测器可以根据这些新实例进行微调。这个过程可以一次又一次地重复，直到我们得到一个非常成熟的概念检测器。在Pascal VOC-07/10/12目标检测数据集上的大量实验[9]很好地证明了我们的框架的有效性。通过从每个对象类别的很少样本以及大约20,000个弱标记视频中学习，它可以击败最先进的基于完全训练的性能。

{"title":"Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection","authors":"Xiaodan Liang, Si Liu, Yunchao Wei, Luoqi Liu, Liang Lin, Shuicheng Yan","doi":"10.1109/ICCV.2015.120","DOIUrl":"https://doi.org/10.1109/ICCV.2015.120","url":null,"abstract":"Intuitive observations show that a baby may inherently possess the capability of recognizing a new visual concept (e.g., chair, dog) by learning from only very few positive instances taught by parent(s) or others, and this recognition capability can be gradually further improved by exploring and/or interacting with the real instances in the physical world. Inspired by these observations, we propose a computational model for weakly-supervised object detection, based on prior knowledge modelling, exemplar learning and learning with video contexts. The prior knowledge is modeled with a pre-trained Convolutional Neural Network (CNN). When very few instances of a new concept are given, an initial concept detector is built by exemplar learning over the deep features the pre-trained CNN. The well-designed tracking solution is then used to discover more diverse instances from the massive online weakly labeled videos. Once a positive instance is detected/identified with high score in each video, more instances possibly from different view-angles and/or different distances are tracked and accumulated. Then the concept detector can be fine-tuned based on these new instances. This process can be repeated again and again till we obtain a very mature concept detector. Extensive experiments on Pascal VOC-07/10/12 object detection datasets [9] well demonstrate the effectiveness of our framework. It can beat the state-of-the-art full-training based performances by learning from very few samples for each object category, along with about 20,000 weakly labeled videos.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"27 1","pages":"999-1007"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73937184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 92

Multi-class Multi-annotator Active Learning with Robust Gaussian Process for Visual Recognition 基于鲁棒高斯过程的多类多标注器主动学习视觉识别

2015 IEEE International Conference on Computer Vision (ICCV)

Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.325

Chengjiang Long, G. Hua

Active learning is an effective way to relieve the tedious work of manual annotation in many applications of visual recognition. However, less research attention has been focused on multi-class active learning. In this paper, we propose a novel Gaussian process classifier model with multiple annotators for multi-class visual recognition. Expectation propagation (EP) is adopted for efficient approximate Bayesian inference of our probabilistic model for classification. Based on the EP approximation inference, a generalized Expectation Maximization (GEM) algorithm is derived to estimate both the parameters for instances and the quality of each individual annotator. Also, we incorporate the idea of reinforcement learning to actively select both the informative samples and the high-quality annotators, which better explores the trade-off between exploitation and exploration. The experiments clearly demonstrate the efficacy of the proposed model.

在视觉识别的许多应用中，主动学习是一种有效的方法，可以减轻手工标注的繁琐工作。然而，对多班级主动学习的研究较少。本文提出了一种具有多标注器的高斯过程分类器模型，用于多类视觉识别。采用期望传播(EP)对分类概率模型进行有效的近似贝叶斯推理。在EP近似推理的基础上，推导出一种广义期望最大化(GEM)算法来估计实例的参数和每个注释器的质量。此外，我们还结合了强化学习的思想来主动选择信息丰富的样本和高质量的注释器，从而更好地探索了开发和探索之间的权衡。实验清楚地证明了该模型的有效性。

引用次数: 69

Self-Occlusions and Disocclusions in Causal Video Object Segmentation 因果视频目标分割中的自闭塞与解除闭塞

2015 IEEE International Conference on Computer Vision (ICCV)

Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.501

Yanchao Yang, G. Sundaramoorthi, Stefano Soatto

We propose a method to detect disocclusion in video sequences of three-dimensional scenes and to partition the disoccluded regions into objects, defined by coherent deformation corresponding to surfaces in the scene. Our method infers deformation fields that are piecewise smooth by construction without the need for an explicit regularizer and the associated choice of weight. It then partitions the disoccluded region and groups its components with objects by leveraging on the complementarity of motion and appearance cues: Where appearance changes within an object, motion can usually be reliably inferred and used for grouping. Where appearance is close to constant, it can be used for grouping directly. We integrate both cues in an energy minimization framework, incorporate prior assumptions explicitly into the energy, and propose a numerical scheme.

我们提出了一种检测三维场景视频序列中咬合的方法，并将咬合的区域划分为物体，由对应于场景中表面的相干变形来定义。我们的方法通过构造推断出分段平滑的变形场，而不需要明确的正则化器和相关的权重选择。然后，它通过利用运动和外观线索的互补性来划分未遮挡的区域，并将其组件与对象分组:当对象内的外观发生变化时，运动通常可以可靠地推断并用于分组。当外观接近常量时，可直接用于分组。我们将这两个线索整合到能量最小化框架中，将先前的假设明确地纳入能量中，并提出了一个数值方案。

引用次数: 26

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2015 IEEE International Conference on Computer Vision (ICCV)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀