首页 > 最新文献

2015 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
Learning Spatially Regularized Correlation Filters for Visual Tracking 学习用于视觉跟踪的空间正则化相关滤波器
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.490
Martin Danelljan, Gustav Häger, F. Khan, M. Felsberg
Robust and accurate visual tracking is one of the most challenging computer vision problems. Due to the inherent lack of training data, a robust approach for constructing a target appearance model is crucial. Recently, discriminatively learned correlation filters (DCF) have been successfully applied to address this problem for tracking. These methods utilize a periodic assumption of the training samples to efficiently learn a classifier on all patches in the target neighborhood. However, the periodic assumption also introduces unwanted boundary effects, which severely degrade the quality of the tracking model. We propose Spatially Regularized Discriminative Correlation Filters (SRDCF) for tracking. A spatial regularization component is introduced in the learning to penalize correlation filter coefficients depending on their spatial location. Our SRDCF formulation allows the correlation filters to be learned on a significantly larger set of negative training samples, without corrupting the positive samples. We further propose an optimization strategy, based on the iterative Gauss-Seidel method, for efficient online learning of our SRDCF. Experiments are performed on four benchmark datasets: OTB-2013, ALOV++, OTB-2015, and VOT2014. Our approach achieves state-of-the-art results on all four datasets. On OTB-2013 and OTB-2015, we obtain an absolute gain of 8.0% and 8.2% respectively, in mean overlap precision, compared to the best existing trackers.
鲁棒和精确的视觉跟踪是最具挑战性的计算机视觉问题之一。由于缺乏训练数据,构建目标外观模型的鲁棒方法至关重要。近年来,判别学习相关滤波器(DCF)被成功地应用于跟踪中。这些方法利用训练样本的周期性假设,在目标邻域的所有斑块上有效地学习分类器。然而,周期假设也引入了不必要的边界效应,严重降低了跟踪模型的质量。我们提出空间正则化判别相关滤波器(SRDCF)用于跟踪。在学习中引入空间正则化分量,根据相关滤波系数的空间位置对其进行惩罚。我们的SRDCF公式允许在更大的负训练样本集上学习相关滤波器,而不会破坏正样本。我们进一步提出了一种基于迭代Gauss-Seidel方法的优化策略,用于有效地在线学习我们的SRDCF。在OTB-2013、alov++、OTB-2015和VOT2014四个基准数据集上进行了实验。我们的方法在所有四个数据集上实现了最先进的结果。在OTB-2013和OTB-2015上,与现有最佳跟踪器相比,我们获得了平均重叠精度的绝对增益分别为8.0%和8.2%。
{"title":"Learning Spatially Regularized Correlation Filters for Visual Tracking","authors":"Martin Danelljan, Gustav Häger, F. Khan, M. Felsberg","doi":"10.1109/ICCV.2015.490","DOIUrl":"https://doi.org/10.1109/ICCV.2015.490","url":null,"abstract":"Robust and accurate visual tracking is one of the most challenging computer vision problems. Due to the inherent lack of training data, a robust approach for constructing a target appearance model is crucial. Recently, discriminatively learned correlation filters (DCF) have been successfully applied to address this problem for tracking. These methods utilize a periodic assumption of the training samples to efficiently learn a classifier on all patches in the target neighborhood. However, the periodic assumption also introduces unwanted boundary effects, which severely degrade the quality of the tracking model. We propose Spatially Regularized Discriminative Correlation Filters (SRDCF) for tracking. A spatial regularization component is introduced in the learning to penalize correlation filter coefficients depending on their spatial location. Our SRDCF formulation allows the correlation filters to be learned on a significantly larger set of negative training samples, without corrupting the positive samples. We further propose an optimization strategy, based on the iterative Gauss-Seidel method, for efficient online learning of our SRDCF. Experiments are performed on four benchmark datasets: OTB-2013, ALOV++, OTB-2015, and VOT2014. Our approach achieves state-of-the-art results on all four datasets. On OTB-2013 and OTB-2015, we obtain an absolute gain of 8.0% and 8.2% respectively, in mean overlap precision, compared to the best existing trackers.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"67 1","pages":"4310-4318"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91009967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1749
Simultaneous Foreground Detection and Classification with Hybrid Features 混合特征的同时前景检测与分类
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.378
Jaemyun Kim, Adín Ramírez Rivera, Byungyong Ryu, O. Chae
In this paper, we propose a hybrid background model that relies on edge and non-edge features of the image to produce the model. We encode these features into a coding scheme, that we called Local Hybrid Pattern (LHP), that selectively models edges and non-edges features of each pixel. Furthermore, we model each pixel with an adaptive code dictionary to represent the background dynamism, and update it by adding stable codes and discarding unstable ones. We weight each code in the dictionary to enhance its description of the pixel it models. The foreground is detected as the incoming codes that deviate from the dictionary. We can detect (as foreground or background) and classify (as edge or inner region) each pixel simultaneously. We tested our proposed method in existing databases with promising results.
在本文中,我们提出了一种混合背景模型,它依赖于图像的边缘和非边缘特征来产生模型。我们将这些特征编码成一种编码方案,我们称之为局部混合模式(LHP),该方案选择性地对每个像素的边缘和非边缘特征进行建模。此外,我们使用自适应代码字典对每个像素进行建模,以表示背景动态,并通过添加稳定代码和丢弃不稳定代码来更新它。我们对字典中的每个代码进行加权,以增强其对其建模的像素的描述。前景被检测为偏离字典的传入代码。我们可以同时检测(作为前景或背景)和分类(作为边缘或内部区域)每个像素。我们在现有的数据库中测试了我们提出的方法,结果很有希望。
{"title":"Simultaneous Foreground Detection and Classification with Hybrid Features","authors":"Jaemyun Kim, Adín Ramírez Rivera, Byungyong Ryu, O. Chae","doi":"10.1109/ICCV.2015.378","DOIUrl":"https://doi.org/10.1109/ICCV.2015.378","url":null,"abstract":"In this paper, we propose a hybrid background model that relies on edge and non-edge features of the image to produce the model. We encode these features into a coding scheme, that we called Local Hybrid Pattern (LHP), that selectively models edges and non-edges features of each pixel. Furthermore, we model each pixel with an adaptive code dictionary to represent the background dynamism, and update it by adding stable codes and discarding unstable ones. We weight each code in the dictionary to enhance its description of the pixel it models. The foreground is detected as the incoming codes that deviate from the dictionary. We can detect (as foreground or background) and classify (as edge or inner region) each pixel simultaneously. We tested our proposed method in existing databases with promising results.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"319 1","pages":"3307-3315"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91473170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Projection onto the Manifold of Elongated Structures for Accurate Extraction 投影到流形上的细长结构精确提取
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.44
A. Sironi, V. Lepetit, P. Fua
Detection of elongated structures in 2D images and 3D image stacks is a critical prerequisite in many applications and Machine Learning-based approaches have recently been shown to deliver superior performance. However, these methods essentially classify individual locations and do not explicitly model the strong relationship that exists between neighboring ones. As a result, isolated erroneous responses, discontinuities, and topological errors are present in the resulting score maps. We solve this problem by projecting patches of the score map to their nearest neighbors in a set of ground truth training patches. Our algorithm induces global spatial consistency on the classifier score map and returns results that are provably geometrically consistent. We apply our algorithm to challenging datasets in four different domains and show that it compares favorably to state-of-the-art methods.
在许多应用中,检测2D图像和3D图像堆栈中的细长结构是一个关键的先决条件,基于机器学习的方法最近被证明具有卓越的性能。然而,这些方法基本上是对单个位置进行分类,并没有明确地对相邻位置之间存在的强烈关系进行建模。因此,孤立的错误响应、不连续性和拓扑错误出现在结果的分数图中。我们通过在一组地面真值训练补丁中,将分数地图的补丁投影到它们最近的邻居来解决这个问题。我们的算法在分类器得分图上诱导全局空间一致性,并返回可证明的几何一致性结果。我们将我们的算法应用于四个不同领域的挑战性数据集,并表明它优于最先进的方法。
{"title":"Projection onto the Manifold of Elongated Structures for Accurate Extraction","authors":"A. Sironi, V. Lepetit, P. Fua","doi":"10.1109/ICCV.2015.44","DOIUrl":"https://doi.org/10.1109/ICCV.2015.44","url":null,"abstract":"Detection of elongated structures in 2D images and 3D image stacks is a critical prerequisite in many applications and Machine Learning-based approaches have recently been shown to deliver superior performance. However, these methods essentially classify individual locations and do not explicitly model the strong relationship that exists between neighboring ones. As a result, isolated erroneous responses, discontinuities, and topological errors are present in the resulting score maps. We solve this problem by projecting patches of the score map to their nearest neighbors in a set of ground truth training patches. Our algorithm induces global spatial consistency on the classifier score map and returns results that are provably geometrically consistent. We apply our algorithm to challenging datasets in four different domains and show that it compares favorably to state-of-the-art methods.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"20 1","pages":"316-324"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74523830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
A Comprehensive Multi-Illuminant Dataset for Benchmarking of the Intrinsic Image Algorithms 一个综合的多光源数据集用于内在图像算法的基准测试
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.28
Shida Beigpour, A. Kolb, Sven Kunz
In this paper, we provide a new, real photo dataset with precise ground-truth for intrinsic image research. Prior ground-truth datasets have been restricted to rather simple illumination conditions and scene geometries, or have been enhanced using image synthesis methods. The dataset provided in this paper is based on complex multi-illuminant scenarios under multi-colored illumination conditions and challenging cast shadows. We provide full per-pixel intrinsic ground-truth data for these scenarios, i.e. reflectance, specularity, shading, and illumination for scenes as well as preliminary depth information. Furthermore, we evaluate 3 state-of-the-art intrinsic image recovery methods, using our dataset.
在本文中,我们提供了一个新的、真实的照片数据集,具有精确的地面真值,用于内在图像的研究。以前的真实数据集仅限于相当简单的照明条件和场景几何,或者已经使用图像合成方法进行了增强。本文提供的数据集是基于多色照明条件下的复杂多光源场景和具有挑战性的阴影。我们为这些场景提供了完整的每像素固有的真实数据,即反射率,镜面,阴影和场景照明以及初步深度信息。此外,我们使用我们的数据集评估了3种最先进的内在图像恢复方法。
{"title":"A Comprehensive Multi-Illuminant Dataset for Benchmarking of the Intrinsic Image Algorithms","authors":"Shida Beigpour, A. Kolb, Sven Kunz","doi":"10.1109/ICCV.2015.28","DOIUrl":"https://doi.org/10.1109/ICCV.2015.28","url":null,"abstract":"In this paper, we provide a new, real photo dataset with precise ground-truth for intrinsic image research. Prior ground-truth datasets have been restricted to rather simple illumination conditions and scene geometries, or have been enhanced using image synthesis methods. The dataset provided in this paper is based on complex multi-illuminant scenarios under multi-colored illumination conditions and challenging cast shadows. We provide full per-pixel intrinsic ground-truth data for these scenarios, i.e. reflectance, specularity, shading, and illumination for scenes as well as preliminary depth information. Furthermore, we evaluate 3 state-of-the-art intrinsic image recovery methods, using our dataset.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"48 1","pages":"172-180"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79112375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Fast and Effective L0 Gradient Minimization by Region Fusion 基于区域融合的快速有效L0梯度最小化算法
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.32
Nguyen Ho Man Rang, M. S. Brown
L0 gradient minimization can be applied to an input signal to control the number of non-zero gradients. This is useful in reducing small gradients generally associated with signal noise, while preserving important signal features. In computer vision, L0 gradient minimization has found applications in image denoising, 3D mesh denoising, and image enhancement. Minimizing the L0 norm, however, is an NP-hard problem because of its non-convex property. As a result, existing methods rely on approximation strategies to perform the minimization. In this paper, we present a new method to perform L0 gradient minimization that is fast and effective. Our method uses a descent approach based on region fusion that converges faster than other methods while providing a better approximation of the optimal L0 norm. In addition, our method can be applied to both 2D images and 3D mesh topologies. The effectiveness of our approach is demonstrated on a number of examples.
L0梯度最小化可以应用于输入信号来控制非零梯度的数量。这在减小通常与信号噪声相关的小梯度,同时保留重要的信号特征方面是有用的。在计算机视觉中,L0梯度最小化在图像去噪、3D网格去噪和图像增强中得到了应用。然而,最小化L0范数是一个np困难问题,因为它的非凸性。因此,现有的方法依赖于近似策略来执行最小化。本文提出了一种快速有效的L0梯度最小化方法。我们的方法使用了一种基于区域融合的下降方法,它比其他方法收敛得更快,同时提供了更好的最优L0范数的近似值。此外,我们的方法可以应用于二维图像和三维网格拓扑。若干实例证明了我们方法的有效性。
{"title":"Fast and Effective L0 Gradient Minimization by Region Fusion","authors":"Nguyen Ho Man Rang, M. S. Brown","doi":"10.1109/ICCV.2015.32","DOIUrl":"https://doi.org/10.1109/ICCV.2015.32","url":null,"abstract":"L0 gradient minimization can be applied to an input signal to control the number of non-zero gradients. This is useful in reducing small gradients generally associated with signal noise, while preserving important signal features. In computer vision, L0 gradient minimization has found applications in image denoising, 3D mesh denoising, and image enhancement. Minimizing the L0 norm, however, is an NP-hard problem because of its non-convex property. As a result, existing methods rely on approximation strategies to perform the minimization. In this paper, we present a new method to perform L0 gradient minimization that is fast and effective. Our method uses a descent approach based on region fusion that converges faster than other methods while providing a better approximation of the optimal L0 norm. In addition, our method can be applied to both 2D images and 3D mesh topologies. The effectiveness of our approach is demonstrated on a number of examples.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"111 1","pages":"208-216"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79182295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Single Image Pop-Up from Discriminatively Learned Parts 单个图像弹出从判别学习部分
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.112
Menglong Zhu, Xiaowei Zhou, Kostas Daniilidis
We introduce a new approach for estimating a fine grained 3D shape and continuous pose of an object from a single image. Given a training set of view exemplars, we learn and select appearance-based discriminative parts which are mapped onto the 3D model through a facility location optimization. The training set of 3D models is summarized into a set of basis shapes from which we can generalize by linear combination. Given a test image, we detect hypotheses for each part. The main challenge is to select from these hypotheses and compute the 3D pose and shape coefficients at the same time. To achieve this, we optimize a function that considers simultaneously the appearance matching of the parts as well as the geometric reprojection error. We apply the alternating direction method of multipliers (ADMM) to minimize the resulting convex function. Our main and novel contribution is the simultaneous solution for part localization and detailed 3D geometry estimation by maximizing both appearance and geometric compatibility with convex relaxation.
我们介绍了一种从单幅图像中估计物体的细粒度三维形状和连续姿态的新方法。给定一个视图示例训练集,我们学习并选择基于外观的判别部件,这些部件通过设施位置优化映射到3D模型上。将三维模型的训练集归纳为一组基形状,通过线性组合进行归纳。给定一个测试图像,我们检测每个部分的假设。主要的挑战是从这些假设中进行选择,同时计算三维姿态和形状系数。为了实现这一点,我们优化了一个函数,该函数同时考虑了零件的外观匹配以及几何重投影误差。我们应用乘法器的交替方向法(ADMM)来最小化所得到的凸函数。我们的主要和新颖的贡献是通过最大化外观和凸松弛的几何兼容性来同时解决零件定位和详细的3D几何估计。
{"title":"Single Image Pop-Up from Discriminatively Learned Parts","authors":"Menglong Zhu, Xiaowei Zhou, Kostas Daniilidis","doi":"10.1109/ICCV.2015.112","DOIUrl":"https://doi.org/10.1109/ICCV.2015.112","url":null,"abstract":"We introduce a new approach for estimating a fine grained 3D shape and continuous pose of an object from a single image. Given a training set of view exemplars, we learn and select appearance-based discriminative parts which are mapped onto the 3D model through a facility location optimization. The training set of 3D models is summarized into a set of basis shapes from which we can generalize by linear combination. Given a test image, we detect hypotheses for each part. The main challenge is to select from these hypotheses and compute the 3D pose and shape coefficients at the same time. To achieve this, we optimize a function that considers simultaneously the appearance matching of the parts as well as the geometric reprojection error. We apply the alternating direction method of multipliers (ADMM) to minimize the resulting convex function. Our main and novel contribution is the simultaneous solution for part localization and detailed 3D geometry estimation by maximizing both appearance and geometric compatibility with convex relaxation.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"33 1","pages":"927-935"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72712417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Component-Wise Modeling of Articulated Objects 铰接对象的组件建模
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.268
Valsamis Ntouskos, Marta Sanzari, B. Cafaro, F. Nardi, Fabrizio Natola, F. Pirri, M. A. Garcia
We introduce a novel framework for modeling articulated objects based on the aspects of their components. By decomposing the object into components, we divide the problem in smaller modeling tasks. After obtaining 3D models for each component aspect by employing a shape deformation paradigm, we merge them together, forming the object components. The final model is obtained by assembling the components using an optimization scheme which fits the respective 3D models to the corresponding apparent contours in a reference pose. The results suggest that our approach can produce realistic 3D models of articulated objects in reasonable time.
我们介绍了一种基于铰接对象组件方面建模的新框架。通过将对象分解为组件,我们将问题划分为更小的建模任务。在使用形状变形范式获得每个部件方面的三维模型后,我们将它们合并在一起,形成对象组件。采用一种优化方案,将各部件的三维模型拟合到参考位姿的相应表观轮廓上,从而组装出最终模型。结果表明,我们的方法可以在合理的时间内生成逼真的三维铰接物体模型。
{"title":"Component-Wise Modeling of Articulated Objects","authors":"Valsamis Ntouskos, Marta Sanzari, B. Cafaro, F. Nardi, Fabrizio Natola, F. Pirri, M. A. Garcia","doi":"10.1109/ICCV.2015.268","DOIUrl":"https://doi.org/10.1109/ICCV.2015.268","url":null,"abstract":"We introduce a novel framework for modeling articulated objects based on the aspects of their components. By decomposing the object into components, we divide the problem in smaller modeling tasks. After obtaining 3D models for each component aspect by employing a shape deformation paradigm, we merge them together, forming the object components. The final model is obtained by assembling the components using an optimization scheme which fits the respective 3D models to the corresponding apparent contours in a reference pose. The results suggest that our approach can produce realistic 3D models of articulated objects in reasonable time.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"2327-2335"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75275605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Learning Deep Representation with Large-Scale Attributes 学习大规模属性的深度表示
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.220
Wanli Ouyang, Hongyang Li, Xingyu Zeng, Xiaogang Wang
Learning strong feature representations from large scale supervision has achieved remarkable success in computer vision as the emergence of deep learning techniques. It is driven by big visual data with rich annotations. This paper contributes a large-scale object attribute database that contains rich attribute annotations (over 300 attributes) for ~180k samples and 494 object classes. Based on the ImageNet object detection dataset, it annotates the rotation, viewpoint, object part location, part occlusion, part existence, common attributes, and class-specific attributes. Then we use this dataset to train deep representations and extensively evaluate how these attributes are useful on the general object detection task. In order to make better use of the attribute annotations, a deep learning scheme is proposed by modeling the relationship of attributes and hierarchically clustering them into semantically meaningful mixture types. Experimental results show that the attributes are helpful in learning better features and improving the object detection accuracy by 2.6% in mAP on the ILSVRC 2014 object detection dataset and 2.4% in mAP on PASCAL VOC 2007 object detection dataset. Such improvement is well generalized across datasets.
随着深度学习技术的出现,从大规模监督中学习强特征表示在计算机视觉领域取得了显著的成功。它由具有丰富注释的大可视化数据驱动。本文构建了一个大型对象属性数据库,该数据库包含约180k个样本和494个对象类的丰富属性注释(超过300个属性)。基于ImageNet对象检测数据集,对旋转、视点、对象部分位置、部分遮挡、部分存在、公共属性和类特定属性进行标注。然后,我们使用该数据集来训练深度表征,并广泛评估这些属性在一般目标检测任务中的用处。为了更好地利用属性标注,提出了一种深度学习方案,对属性之间的关系进行建模,并将其分层聚类为语义上有意义的混合类型。实验结果表明,这些属性有助于学习更好的特征,在ILSVRC 2014目标检测数据集上,mAP的目标检测准确率提高2.6%,在PASCAL VOC 2007目标检测数据集上,mAP的目标检测准确率提高2.4%。这种改进可以很好地推广到各个数据集。
{"title":"Learning Deep Representation with Large-Scale Attributes","authors":"Wanli Ouyang, Hongyang Li, Xingyu Zeng, Xiaogang Wang","doi":"10.1109/ICCV.2015.220","DOIUrl":"https://doi.org/10.1109/ICCV.2015.220","url":null,"abstract":"Learning strong feature representations from large scale supervision has achieved remarkable success in computer vision as the emergence of deep learning techniques. It is driven by big visual data with rich annotations. This paper contributes a large-scale object attribute database that contains rich attribute annotations (over 300 attributes) for ~180k samples and 494 object classes. Based on the ImageNet object detection dataset, it annotates the rotation, viewpoint, object part location, part occlusion, part existence, common attributes, and class-specific attributes. Then we use this dataset to train deep representations and extensively evaluate how these attributes are useful on the general object detection task. In order to make better use of the attribute annotations, a deep learning scheme is proposed by modeling the relationship of attributes and hierarchically clustering them into semantically meaningful mixture types. Experimental results show that the attributes are helpful in learning better features and improving the object detection accuracy by 2.6% in mAP on the ILSVRC 2014 object detection dataset and 2.4% in mAP on PASCAL VOC 2007 object detection dataset. Such improvement is well generalized across datasets.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"42 1","pages":"1895-1903"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75493703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Unsupervised Domain Adaptation for Zero-Shot Learning 零射击学习的无监督域自适应
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.282
Elyor Kodirov, T. Xiang, Zhenyong Fu, S. Gong
Zero-shot learning (ZSL) can be considered as a special case of transfer learning where the source and target domains have different tasks/label spaces and the target domain is unlabelled, providing little guidance for the knowledge transfer. A ZSL method typically assumes that the two domains share a common semantic representation space, where a visual feature vector extracted from an image/video can be projected/embedded using a projection function. Existing approaches learn the projection function from the source domain and apply it without adaptation to the target domain. They are thus based on naive knowledge transfer and the learned projections are prone to the domain shift problem. In this paper a novel ZSL method is proposed based on unsupervised domain adaptation. Specifically, we formulate a novel regularised sparse coding framework which uses the target domain class labels' projections in the semantic space to regularise the learned target domain projection thus effectively overcoming the projection domain shift problem. Extensive experiments on four object and action recognition benchmark datasets show that the proposed ZSL method significantly outperforms the state-of-the-arts.
零射击学习(Zero-shot learning, ZSL)可以看作是迁移学习的一种特殊情况,源领域和目标领域具有不同的任务/标签空间,目标领域没有标签,对知识迁移的指导作用很小。ZSL方法通常假设两个域共享一个共同的语义表示空间,其中从图像/视频中提取的视觉特征向量可以使用投影函数进行投影/嵌入。现有的方法是从源域学习投影函数,并将其应用于目标域,而不适应目标域。因此,它们是基于朴素知识迁移的,学习到的预测容易出现领域转移问题。本文提出了一种基于无监督域自适应的ZSL方法。具体而言,我们提出了一种新的正则化稀疏编码框架,该框架利用目标域类标签在语义空间中的投影对学习到的目标域投影进行正则化,从而有效地克服了投影域移位问题。在四个目标和动作识别基准数据集上进行的大量实验表明,所提出的ZSL方法明显优于目前最先进的方法。
{"title":"Unsupervised Domain Adaptation for Zero-Shot Learning","authors":"Elyor Kodirov, T. Xiang, Zhenyong Fu, S. Gong","doi":"10.1109/ICCV.2015.282","DOIUrl":"https://doi.org/10.1109/ICCV.2015.282","url":null,"abstract":"Zero-shot learning (ZSL) can be considered as a special case of transfer learning where the source and target domains have different tasks/label spaces and the target domain is unlabelled, providing little guidance for the knowledge transfer. A ZSL method typically assumes that the two domains share a common semantic representation space, where a visual feature vector extracted from an image/video can be projected/embedded using a projection function. Existing approaches learn the projection function from the source domain and apply it without adaptation to the target domain. They are thus based on naive knowledge transfer and the learned projections are prone to the domain shift problem. In this paper a novel ZSL method is proposed based on unsupervised domain adaptation. Specifically, we formulate a novel regularised sparse coding framework which uses the target domain class labels' projections in the semantic space to regularise the learned target domain projection thus effectively overcoming the projection domain shift problem. Extensive experiments on four object and action recognition benchmark datasets show that the proposed ZSL method significantly outperforms the state-of-the-arts.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"43 1","pages":"2452-2460"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73580029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 375
kNN Hashing with Factorized Neighborhood Representation 具有分解邻域表示的kNN哈希
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.131
Kun Ding, Chunlei Huo, Bin Fan, Chunhong Pan
Hashing is very effective for many tasks in reducing the processing time and in compressing massive databases. Although lots of approaches have been developed to learn data-dependent hash functions in recent years, how to learn hash functions to yield good performance with acceptable computational and memory cost is still a challenging problem. Based on the observation that retrieval precision is highly related to the kNN classification accuracy, this paper proposes a novel kNN-based supervised hashing method, which learns hash functions by directly maximizing the kNN accuracy of the Hamming-embedded training data. To make it scalable well to large problem, we propose a factorized neighborhood representation to parsimoniously model the neighborhood relationships inherent in training data. Considering that real-world data are often linearly inseparable, we further kernelize this basic model to improve its performance. As a result, the proposed method is able to learn accurate hashing functions with tolerable computation and storage cost. Experiments on four benchmarks demonstrate that our method outperforms the state-of-the-arts.
哈希在减少处理时间和压缩海量数据库方面对许多任务都非常有效。尽管近年来已经开发了许多学习依赖数据的哈希函数的方法,但如何学习哈希函数以获得良好的性能和可接受的计算和内存成本仍然是一个具有挑战性的问题。基于检索精度与kNN分类精度高度相关的观察,本文提出了一种新的基于kNN的监督哈希方法,该方法通过直接最大化嵌入hhaming的训练数据的kNN精度来学习哈希函数。为了使其能够很好地扩展到大型问题,我们提出了一种分解邻域表示来简化训练数据中固有的邻域关系的建模。考虑到现实世界的数据通常是线性不可分的,我们进一步对这个基本模型进行核化以提高其性能。结果表明,该方法能够在可接受的计算和存储代价下学习精确的哈希函数。在四个基准测试上的实验表明,我们的方法优于最先进的方法。
{"title":"kNN Hashing with Factorized Neighborhood Representation","authors":"Kun Ding, Chunlei Huo, Bin Fan, Chunhong Pan","doi":"10.1109/ICCV.2015.131","DOIUrl":"https://doi.org/10.1109/ICCV.2015.131","url":null,"abstract":"Hashing is very effective for many tasks in reducing the processing time and in compressing massive databases. Although lots of approaches have been developed to learn data-dependent hash functions in recent years, how to learn hash functions to yield good performance with acceptable computational and memory cost is still a challenging problem. Based on the observation that retrieval precision is highly related to the kNN classification accuracy, this paper proposes a novel kNN-based supervised hashing method, which learns hash functions by directly maximizing the kNN accuracy of the Hamming-embedded training data. To make it scalable well to large problem, we propose a factorized neighborhood representation to parsimoniously model the neighborhood relationships inherent in training data. Considering that real-world data are often linearly inseparable, we further kernelize this basic model to improve its performance. As a result, the proposed method is able to learn accurate hashing functions with tolerable computation and storage cost. Experiments on four benchmarks demonstrate that our method outperforms the state-of-the-arts.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"6 1","pages":"1098-1106"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90110912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
2015 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1