首页 > 最新文献

2015 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
A Collaborative Filtering Approach to Real-Time Hand Pose Estimation 一种实时手部姿态估计的协同滤波方法
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.269
Chiho Choi, Ayan Sinha, J. H. Choi, Sujin Jang, K. Ramani
Collaborative filtering aims to predict unknown user ratings in a recommender system by collectively assessing known user preferences. In this paper, we first draw analogies between collaborative filtering and the pose estimation problem. Specifically, we recast the hand pose estimation problem as the cold-start problem for a new user with unknown item ratings in a recommender system. Inspired by fast and accurate matrix factorization techniques for collaborative filtering, we develop a real-time algorithm for estimating the hand pose from RGB-D data of a commercial depth camera. First, we efficiently identify nearest neighbors using local shape descriptors in the RGB-D domain from a library of hand poses with known pose parameter values. We then use this information to evaluate the unknown pose parameters using a joint matrix factorization and completion (JMFC) approach. Our quantitative and qualitative results suggest that our approach is robust to variation in hand configurations while achieving real time performance (≈ 29 FPS) on a standard computer.
协同过滤旨在通过集体评估已知用户偏好来预测推荐系统中未知用户的评分。在本文中,我们首先在协同滤波和姿态估计问题之间进行类比。具体而言,我们将手姿估计问题重新定义为推荐系统中具有未知物品评级的新用户的冷启动问题。受快速准确的协同滤波矩阵分解技术的启发,我们开发了一种基于商用深度相机RGB-D数据的手部姿态实时估计算法。首先,我们使用RGB-D域中的局部形状描述符从已知姿态参数值的手部姿态库中有效地识别出最近邻。然后,我们使用这些信息使用联合矩阵分解和补全(JMFC)方法来评估未知的姿态参数。我们的定量和定性结果表明,我们的方法对手部配置的变化具有鲁棒性,同时在标准计算机上实现实时性能(≈29 FPS)。
{"title":"A Collaborative Filtering Approach to Real-Time Hand Pose Estimation","authors":"Chiho Choi, Ayan Sinha, J. H. Choi, Sujin Jang, K. Ramani","doi":"10.1109/ICCV.2015.269","DOIUrl":"https://doi.org/10.1109/ICCV.2015.269","url":null,"abstract":"Collaborative filtering aims to predict unknown user ratings in a recommender system by collectively assessing known user preferences. In this paper, we first draw analogies between collaborative filtering and the pose estimation problem. Specifically, we recast the hand pose estimation problem as the cold-start problem for a new user with unknown item ratings in a recommender system. Inspired by fast and accurate matrix factorization techniques for collaborative filtering, we develop a real-time algorithm for estimating the hand pose from RGB-D data of a commercial depth camera. First, we efficiently identify nearest neighbors using local shape descriptors in the RGB-D domain from a library of hand poses with known pose parameter values. We then use this information to evaluate the unknown pose parameters using a joint matrix factorization and completion (JMFC) approach. Our quantitative and qualitative results suggest that our approach is robust to variation in hand configurations while achieving real time performance (≈ 29 FPS) on a standard computer.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"9 1","pages":"2336-2344"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80249723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Efficient Video Segmentation Using Parametric Graph Partitioning 基于参数图分割的高效视频分割
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.361
Chen-Ping Yu, Hieu M. Le, G. Zelinsky, D. Samaras
Video segmentation is the task of grouping similar pixels in the spatio-temporal domain, and has become an important preprocessing step for subsequent video analysis. Most video segmentation and supervoxel methods output a hierarchy of segmentations, but while this provides useful multiscale information, it also adds difficulty in selecting the appropriate level for a task. In this work, we propose an efficient and robust video segmentation framework based on parametric graph partitioning (PGP), a fast, almost parameter free graph partitioning method that identifies and removes between-cluster edges to form node clusters. Apart from its computational efficiency, PGP performs clustering of the spatio-temporal volume without requiring a pre-specified cluster number or bandwidth parameters, thus making video segmentation more practical to use in applications. The PGP framework also allows processing sub-volumes, which further improves performance, contrary to other streaming video segmentation methods where sub-volume processing reduces performance. We evaluate the PGP method using the SegTrack v2 and Chen Xiph.org datasets, and show that it outperforms related state-of-the-art algorithms in 3D segmentation metrics and running time.
视频分割是在时空域中对相似像素进行分组的任务,已成为后续视频分析的重要预处理步骤。大多数视频分割和超体素方法输出一个分层的分割,但是虽然这提供了有用的多尺度信息,但它也增加了为任务选择适当级别的困难。在这项工作中,我们提出了一种基于参数图划分(PGP)的高效鲁棒视频分割框架,PGP是一种快速、几乎无参数的图划分方法,可以识别和去除聚类之间的边缘以形成节点聚类。除了计算效率外,PGP在不需要预先指定簇数或带宽参数的情况下对时空体进行聚类,从而使视频分割在应用中更加实用。PGP框架还允许处理子卷,这进一步提高了性能,与其他流媒体视频分割方法相反,子卷处理会降低性能。我们使用SegTrack v2和Chen Xiph.org数据集对PGP方法进行了评估,并表明它在3D分割指标和运行时间方面优于相关的最新算法。
{"title":"Efficient Video Segmentation Using Parametric Graph Partitioning","authors":"Chen-Ping Yu, Hieu M. Le, G. Zelinsky, D. Samaras","doi":"10.1109/ICCV.2015.361","DOIUrl":"https://doi.org/10.1109/ICCV.2015.361","url":null,"abstract":"Video segmentation is the task of grouping similar pixels in the spatio-temporal domain, and has become an important preprocessing step for subsequent video analysis. Most video segmentation and supervoxel methods output a hierarchy of segmentations, but while this provides useful multiscale information, it also adds difficulty in selecting the appropriate level for a task. In this work, we propose an efficient and robust video segmentation framework based on parametric graph partitioning (PGP), a fast, almost parameter free graph partitioning method that identifies and removes between-cluster edges to form node clusters. Apart from its computational efficiency, PGP performs clustering of the spatio-temporal volume without requiring a pre-specified cluster number or bandwidth parameters, thus making video segmentation more practical to use in applications. The PGP framework also allows processing sub-volumes, which further improves performance, contrary to other streaming video segmentation methods where sub-volume processing reduces performance. We evaluate the PGP method using the SegTrack v2 and Chen Xiph.org datasets, and show that it outperforms related state-of-the-art algorithms in 3D segmentation metrics and running time.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"77 1","pages":"3155-3163"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79036626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Adaptive Hashing for Fast Similarity Search 快速相似性搜索的自适应哈希算法
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.125
Fatih Çakir, S. Sclaroff
With the staggering growth in image and video datasets, algorithms that provide fast similarity search and compact storage are crucial. Hashing methods that map the data into Hamming space have shown promise, however, many of these methods employ a batch-learning strategy in which the computational cost and memory requirements may become intractable and infeasible with larger and larger datasets. To overcome these challenges, we propose an online learning algorithm based on stochastic gradient descent in which the hash functions are updated iteratively with streaming data. In experiments with three image retrieval benchmarks, our online algorithm attains retrieval accuracy that is comparable to competing state-of-the-art batch-learning solutions, while our formulation is orders of magnitude faster and being online it is adaptable to the variations of the data. Moreover, our formulation yields improved retrieval performance over a recently reported online hashing technique, Online Kernel Hashing.
随着图像和视频数据集的惊人增长,提供快速相似性搜索和紧凑存储的算法至关重要。将数据映射到汉明空间的哈希方法已经显示出了希望,然而,这些方法中的许多都采用了批处理学习策略,其中计算成本和内存需求可能变得难以处理,并且随着数据集越来越大而变得不可行的。为了克服这些挑战,我们提出了一种基于随机梯度下降的在线学习算法,该算法使用流数据迭代更新哈希函数。在三个图像检索基准的实验中,我们的在线算法达到了与最先进的批量学习解决方案相媲美的检索精度,而我们的公式要快几个数量级,并且在线可以适应数据的变化。此外,我们的公式比最近报道的在线哈希技术——在线内核哈希——产生了更好的检索性能。
{"title":"Adaptive Hashing for Fast Similarity Search","authors":"Fatih Çakir, S. Sclaroff","doi":"10.1109/ICCV.2015.125","DOIUrl":"https://doi.org/10.1109/ICCV.2015.125","url":null,"abstract":"With the staggering growth in image and video datasets, algorithms that provide fast similarity search and compact storage are crucial. Hashing methods that map the data into Hamming space have shown promise, however, many of these methods employ a batch-learning strategy in which the computational cost and memory requirements may become intractable and infeasible with larger and larger datasets. To overcome these challenges, we propose an online learning algorithm based on stochastic gradient descent in which the hash functions are updated iteratively with streaming data. In experiments with three image retrieval benchmarks, our online algorithm attains retrieval accuracy that is comparable to competing state-of-the-art batch-learning solutions, while our formulation is orders of magnitude faster and being online it is adaptable to the variations of the data. Moreover, our formulation yields improved retrieval performance over a recently reported online hashing technique, Online Kernel Hashing.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"26 1","pages":"1044-1052"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81927331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Illumination Robust Color Naming via Label Propagation 基于标签传播的光照鲁棒颜色命名
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.78
Yuanliu Liu, Zejian Yuan, Badong Chen, Jianru Xue, Nanning Zheng
Color composition is an important property for many computer vision tasks like image retrieval and object classification. In this paper we address the problem of inferring the color composition of the intrinsic reflectance of objects, where the shadows and highlights may change the observed color dramatically. We achieve this through color label propagation without recovering the intrinsic reflectance beforehand. Specifically, the color labels are propagated between regions sharing the same reflectance, and the direction of propagation is promoted to be from regions under full illumination and normal view angles to abnormal regions. We detect shadowed and highlighted regions as well as pairs of regions that have similar reflectance. A joint inference process is adopted to trim the inconsistent identities and connections. For evaluation we collect three datasets of images under noticeable highlights and shadows. Experimental results show that our model can effectively describe the color composition of real-world images.
色彩构成是许多计算机视觉任务如图像检索和对象分类的重要属性。在本文中,我们解决了推断物体的本征反射率的颜色组成的问题,其中阴影和高光可能会显著地改变观察到的颜色。我们通过色标传播来实现这一点,而不需要事先恢复本征反射率。具体而言,颜色标签在具有相同反射率的区域之间传播,传播方向从全照度和正常视角区域提升到异常区域。我们检测阴影和高亮区域以及具有相似反射率的成对区域。采用联合推理的方法对不一致的身份和连接进行了删减。为了评估,我们收集了三组明显高光和阴影下的图像数据集。实验结果表明,该模型可以有效地描述真实图像的颜色组成。
{"title":"Illumination Robust Color Naming via Label Propagation","authors":"Yuanliu Liu, Zejian Yuan, Badong Chen, Jianru Xue, Nanning Zheng","doi":"10.1109/ICCV.2015.78","DOIUrl":"https://doi.org/10.1109/ICCV.2015.78","url":null,"abstract":"Color composition is an important property for many computer vision tasks like image retrieval and object classification. In this paper we address the problem of inferring the color composition of the intrinsic reflectance of objects, where the shadows and highlights may change the observed color dramatically. We achieve this through color label propagation without recovering the intrinsic reflectance beforehand. Specifically, the color labels are propagated between regions sharing the same reflectance, and the direction of propagation is promoted to be from regions under full illumination and normal view angles to abnormal regions. We detect shadowed and highlighted regions as well as pairs of regions that have similar reflectance. A joint inference process is adopted to trim the inconsistent identities and connections. For evaluation we collect three datasets of images under noticeable highlights and shadows. Experimental results show that our model can effectively describe the color composition of real-world images.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"65 1","pages":"621-629"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84436860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A Spatio-Temporal Appearance Representation for Video-Based Pedestrian Re-Identification 基于视频的行人再识别的时空外观表征
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.434
Kang Liu, Bingpeng Ma, Wei Zhang, Rui Huang
Pedestrian re-identification is a difficult problem due to the large variations in a person's appearance caused by different poses and viewpoints, illumination changes, and occlusions. Spatial alignment is commonly used to address these issues by treating the appearance of different body parts independently. However, a body part can also appear differently during different phases of an action. In this paper we consider the temporal alignment problem, in addition to the spatial one, and propose a new approach that takes the video of a walking person as input and builds a spatio-temporal appearance representation for pedestrian re-identification. Particularly, given a video sequence we exploit the periodicity exhibited by a walking person to generate a spatio-temporal body-action model, which consists of a series of body-action units corresponding to certain action primitives of certain body parts. Fisher vectors are learned and extracted from individual body-action units and concatenated into the final representation of the walking person. Unlike previous spatio-temporal features that only take into account local dynamic appearance information, our representation aligns the spatio-temporal appearance of a pedestrian globally. Extensive experiments on public datasets show the effectiveness of our approach compared with the state of the art.
行人的重新识别是一个难题,因为人的外表会因不同的姿势和视角、光照变化和遮挡而发生很大变化。空间对齐通常通过独立处理不同身体部位的外观来解决这些问题。然而,一个身体部位在一个动作的不同阶段也会出现不同的表现。本文除了考虑空间对齐问题外,还考虑了时间对齐问题,提出了一种新的方法,即以行走的人的视频作为输入,构建用于行人再识别的时空外观表示。特别是,给定一个视频序列,我们利用行走的人所表现出的周期性来生成一个时空身体动作模型,该模型由一系列身体动作单元组成,这些单元对应于特定身体部位的特定动作基元。Fisher向量是从个体身体动作单元中学习和提取的,并连接到行走的人的最终表示中。与以往只考虑局部动态外观信息的时空特征不同,我们的表征对行人的时空外观进行了全局对齐。在公共数据集上进行的大量实验表明,与目前的技术水平相比,我们的方法是有效的。
{"title":"A Spatio-Temporal Appearance Representation for Video-Based Pedestrian Re-Identification","authors":"Kang Liu, Bingpeng Ma, Wei Zhang, Rui Huang","doi":"10.1109/ICCV.2015.434","DOIUrl":"https://doi.org/10.1109/ICCV.2015.434","url":null,"abstract":"Pedestrian re-identification is a difficult problem due to the large variations in a person's appearance caused by different poses and viewpoints, illumination changes, and occlusions. Spatial alignment is commonly used to address these issues by treating the appearance of different body parts independently. However, a body part can also appear differently during different phases of an action. In this paper we consider the temporal alignment problem, in addition to the spatial one, and propose a new approach that takes the video of a walking person as input and builds a spatio-temporal appearance representation for pedestrian re-identification. Particularly, given a video sequence we exploit the periodicity exhibited by a walking person to generate a spatio-temporal body-action model, which consists of a series of body-action units corresponding to certain action primitives of certain body parts. Fisher vectors are learned and extracted from individual body-action units and concatenated into the final representation of the walking person. Unlike previous spatio-temporal features that only take into account local dynamic appearance information, our representation aligns the spatio-temporal appearance of a pedestrian globally. Extensive experiments on public datasets show the effectiveness of our approach compared with the state of the art.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"115 1","pages":"3810-3818"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80840059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 239
Enhancing Road Maps by Parsing Aerial Images Around the World 通过解析世界各地的航空图像来增强道路地图
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.197
G. Máttyus, Shenlong Wang, S. Fidler, R. Urtasun
In recent years, contextual models that exploit maps have been shown to be very effective for many recognition and localization tasks. In this paper we propose to exploit aerial images in order to enhance freely available world maps. Towards this goal, we make use of OpenStreetMap and formulate the problem as the one of inference in a Markov random field parameterized in terms of the location of the road-segment centerlines as well as their width. This parameterization enables very efficient inference and returns only topologically correct roads. In particular, we can segment all OSM roads in the whole world in a single day using a small cluster of 10 computers. Importantly, our approach generalizes very well, it can be trained using only 1.5 km2 aerial imagery and produce very accurate results in any location across the globe. We demonstrate the effectiveness of our approach outperforming the state-of-the-art in two new benchmarks that we collect. We then show how our enhanced maps are beneficial for semantic segmentation of ground images.
近年来,利用地图的上下文模型已被证明在许多识别和定位任务中非常有效。在本文中,我们提出利用航空图像,以增强免费提供的世界地图。为了实现这一目标,我们利用OpenStreetMap,并将该问题形式化为基于路段中心线位置及其宽度参数化的马尔可夫随机场中的推理问题。这种参数化支持非常有效的推理,并且只返回拓扑正确的道路。特别是,我们可以在一天内使用10台计算机组成的小型集群来分割全世界所有的OSM道路。重要的是,我们的方法泛化得非常好,它可以使用仅1.5平方公里的航空图像进行训练,并在全球任何位置产生非常准确的结果。我们在收集的两个新基准中证明了我们的方法优于最先进的方法的有效性。然后,我们展示了我们的增强地图如何有利于地面图像的语义分割。
{"title":"Enhancing Road Maps by Parsing Aerial Images Around the World","authors":"G. Máttyus, Shenlong Wang, S. Fidler, R. Urtasun","doi":"10.1109/ICCV.2015.197","DOIUrl":"https://doi.org/10.1109/ICCV.2015.197","url":null,"abstract":"In recent years, contextual models that exploit maps have been shown to be very effective for many recognition and localization tasks. In this paper we propose to exploit aerial images in order to enhance freely available world maps. Towards this goal, we make use of OpenStreetMap and formulate the problem as the one of inference in a Markov random field parameterized in terms of the location of the road-segment centerlines as well as their width. This parameterization enables very efficient inference and returns only topologically correct roads. In particular, we can segment all OSM roads in the whole world in a single day using a small cluster of 10 computers. Importantly, our approach generalizes very well, it can be trained using only 1.5 km2 aerial imagery and produce very accurate results in any location across the globe. We demonstrate the effectiveness of our approach outperforming the state-of-the-art in two new benchmarks that we collect. We then show how our enhanced maps are beneficial for semantic segmentation of ground images.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"79 1","pages":"1689-1697"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80911273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 115
Weakly Supervised Graph Based Semantic Segmentation by Learning Communities of Image-Parts 基于弱监督图的图像部分学习群体语义分割
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.160
Niloufar Pourian, S. Karthikeyan, B. S. Manjunath
We present a weakly-supervised approach to semantic segmentation. The goal is to assign pixel-level labels given only partial information, for example, image-level labels. This is an important problem in many application scenarios where it is difficult to get accurate segmentation or not feasible to obtain detailed annotations. The proposed approach starts with an initial coarse segmentation, followed by a spectral clustering approach that groups related image parts into communities. A community-driven graph is then constructed that captures spatial and feature relationships between communities while a label graph captures correlations between image labels. Finally, mapping the image level labels to appropriate communities is formulated as a convex optimization problem. The proposed approach does not require location information for image level labels and can be trained using partially labeled datasets. Compared to the state-of-the-art weakly supervised approaches, we achieve a significant performance improvement of 9% on MSRC-21 dataset and 11% on LabelMe dataset, while being more than 300 times faster.
我们提出了一种弱监督的语义分割方法。目标是分配只给出部分信息的像素级标签,例如图像级标签。在许多应用场景中,这是一个很重要的问题,因为很难得到准确的分割或无法获得详细的注释。该方法首先进行初始粗分割,然后采用光谱聚类方法将相关图像部分分组。然后构建一个社区驱动图,捕获社区之间的空间和特征关系,而标签图捕获图像标签之间的相关性。最后,将图像级标签映射到适当的社区,并将其表述为一个凸优化问题。该方法不需要图像级标签的位置信息,并且可以使用部分标记的数据集进行训练。与最先进的弱监督方法相比,我们在MSRC-21数据集上实现了9%的显着性能提升,在LabelMe数据集上实现了11%的性能提升,同时速度提高了300多倍。
{"title":"Weakly Supervised Graph Based Semantic Segmentation by Learning Communities of Image-Parts","authors":"Niloufar Pourian, S. Karthikeyan, B. S. Manjunath","doi":"10.1109/ICCV.2015.160","DOIUrl":"https://doi.org/10.1109/ICCV.2015.160","url":null,"abstract":"We present a weakly-supervised approach to semantic segmentation. The goal is to assign pixel-level labels given only partial information, for example, image-level labels. This is an important problem in many application scenarios where it is difficult to get accurate segmentation or not feasible to obtain detailed annotations. The proposed approach starts with an initial coarse segmentation, followed by a spectral clustering approach that groups related image parts into communities. A community-driven graph is then constructed that captures spatial and feature relationships between communities while a label graph captures correlations between image labels. Finally, mapping the image level labels to appropriate communities is formulated as a convex optimization problem. The proposed approach does not require location information for image level labels and can be trained using partially labeled datasets. Compared to the state-of-the-art weakly supervised approaches, we achieve a significant performance improvement of 9% on MSRC-21 dataset and 11% on LabelMe dataset, while being more than 300 times faster.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"36 1","pages":"1359-1367"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82042904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
A Multiscale Variable-Grouping Framework for MRF Energy Minimization MRF能量最小化的多尺度变量分组框架
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.210
Omer Meir, M. Galun, Stav Yagev, R. Basri, I. Yavneh
We present a multiscale approach for minimizing the energy associated with Markov Random Fields (MRFs) with energy functions that include arbitrary pairwise potentials. The MRF is represented on a hierarchy of successively coarser scales, where the problem on each scale is itself an MRF with suitably defined potentials. These representations are used to construct an efficient multiscale algorithm that seeks a minimal-energy solution to the original problem. The algorithm is iterative and features a bidirectional crosstalk between fine and coarse representations. We use consistency criteria to guarantee that the energy is nonincreasing throughout the iterative process. The algorithm is evaluated on real-world datasets, achieving competitive performance in relatively short run-times.
我们提出了一种多尺度方法,用于最小化与包含任意成对势的能量函数的马尔可夫随机场(mrf)相关的能量。MRF被表示在一个连续较粗尺度的层次结构上,其中每个尺度上的问题本身就是一个具有适当定义的势的MRF。这些表示用于构建一个高效的多尺度算法,该算法寻求原始问题的最小能量解。该算法是迭代的,并且具有精细和粗糙表示之间的双向串扰。我们使用一致性准则来保证在整个迭代过程中能量不增加。该算法在真实世界的数据集上进行了评估,在相对较短的运行时间内实现了具有竞争力的性能。
{"title":"A Multiscale Variable-Grouping Framework for MRF Energy Minimization","authors":"Omer Meir, M. Galun, Stav Yagev, R. Basri, I. Yavneh","doi":"10.1109/ICCV.2015.210","DOIUrl":"https://doi.org/10.1109/ICCV.2015.210","url":null,"abstract":"We present a multiscale approach for minimizing the energy associated with Markov Random Fields (MRFs) with energy functions that include arbitrary pairwise potentials. The MRF is represented on a hierarchy of successively coarser scales, where the problem on each scale is itself an MRF with suitably defined potentials. These representations are used to construct an efficient multiscale algorithm that seeks a minimal-energy solution to the original problem. The algorithm is iterative and features a bidirectional crosstalk between fine and coarse representations. We use consistency criteria to guarantee that the energy is nonincreasing throughout the iterative process. The algorithm is evaluated on real-world datasets, achieving competitive performance in relatively short run-times.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"48 1","pages":"1805-1813"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73141691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Per-Sample Kernel Adaptation for Visual Recognition and Grouping 基于样本核自适应的视觉识别与分组
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.148
Borislav Antic, B. Ommer
Object, action, or scene representations that are corrupted by noise significantly impair the performance of visual recognition. Typically, partial occlusion, clutter, or excessive articulation affects only a subset of all feature dimensions and, most importantly, different dimensions are corrupted in different samples. Nevertheless, the common approach to this problem in feature selection and kernel methods is to down-weight or eliminate entire training samples or the same dimensions of all samples. Thus, valuable signal is lost, resulting in suboptimal classification. Our goal is, therefore, to adjust the contribution of individual feature dimensions when comparing any two samples and computing their similarity. Consequently, per-sample selection of informative dimensions is directly integrated into kernel computation. The interrelated problems of learning the parameters of a kernel classifier and determining the informative components of each sample are then addressed in a joint objective function. The approach can be integrated into the learning stage of any kernel-based visual recognition problem and it does not affect the computational performance in the retrieval phase. Experiments on diverse challenges of action recognition in videos and indoor scene classification show the general applicability of the approach and its ability to improve learning of visual representations.
被噪声破坏的对象、动作或场景表示会严重损害视觉识别的性能。通常,部分遮挡、杂波或过度衔接只影响所有特征维度的一个子集,最重要的是,不同的维度在不同的样本中被破坏。然而,在特征选择和核方法中,解决这一问题的常用方法是降权或消除整个训练样本或所有样本的相同维度。因此,有价值的信号丢失,导致次优分类。因此,我们的目标是在比较任意两个样本并计算它们的相似性时调整单个特征维度的贡献。因此,每样本信息维度的选择直接集成到核计算中。学习核分类器的参数和确定每个样本的信息成分的相关问题,然后在联合目标函数中得到解决。该方法可以集成到任何基于核的视觉识别问题的学习阶段,并且不影响检索阶段的计算性能。对视频和室内场景分类中各种动作识别挑战的实验表明了该方法的普遍适用性及其提高视觉表征学习的能力。
{"title":"Per-Sample Kernel Adaptation for Visual Recognition and Grouping","authors":"Borislav Antic, B. Ommer","doi":"10.1109/ICCV.2015.148","DOIUrl":"https://doi.org/10.1109/ICCV.2015.148","url":null,"abstract":"Object, action, or scene representations that are corrupted by noise significantly impair the performance of visual recognition. Typically, partial occlusion, clutter, or excessive articulation affects only a subset of all feature dimensions and, most importantly, different dimensions are corrupted in different samples. Nevertheless, the common approach to this problem in feature selection and kernel methods is to down-weight or eliminate entire training samples or the same dimensions of all samples. Thus, valuable signal is lost, resulting in suboptimal classification. Our goal is, therefore, to adjust the contribution of individual feature dimensions when comparing any two samples and computing their similarity. Consequently, per-sample selection of informative dimensions is directly integrated into kernel computation. The interrelated problems of learning the parameters of a kernel classifier and determining the informative components of each sample are then addressed in a joint objective function. The approach can be integrated into the learning stage of any kernel-based visual recognition problem and it does not affect the computational performance in the retrieval phase. Experiments on diverse challenges of action recognition in videos and indoor scene classification show the general applicability of the approach and its ability to improve learning of visual representations.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"6 1","pages":"1251-1259"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73308285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Detection and Segmentation of 2D Curved Reflection Symmetric Structures 二维弯曲反射对称结构的检测与分割
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.192
C. L. Teo, C. Fermüller, Y. Aloimonos
Symmetry, as one of the key components of Gestalt theory, provides an important mid-level cue that serves as input to higher visual processes such as segmentation. In this work, we propose a complete approach that links the detection of curved reflection symmetries to produce symmetry-constrained segments of structures/regions in real images with clutter. For curved reflection symmetry detection, we leverage on patch-based symmetric features to train a Structured Random Forest classifier that detects multiscaled curved symmetries in 2D images. Next, using these curved symmetries, we modulate a novel symmetry-constrained foreground-background segmentation by their symmetry scores so that we enforce global symmetrical consistency in the final segmentation. This is achieved by imposing a pairwise symmetry prior that encourages symmetric pixels to have the same labels over a MRF-based representation of the input image edges, and the final segmentation is obtained via graph-cuts. Experimental results over four publicly available datasets containing annotated symmetric structures: 1) SYMMAX-300 [38], 2) BSD-Parts, 3) Weizmann Horse (both from [18]) and 4) NY-roads [35] demonstrate the approach's applicability to different environments with state-of-the-art performance.
对称,作为格式塔理论的关键组成部分之一,提供了一个重要的中级线索,作为输入到更高的视觉过程,如分割。在这项工作中,我们提出了一种完整的方法,将弯曲反射对称性的检测与在具有杂波的真实图像中产生对称约束的结构/区域片段联系起来。对于弯曲反射对称检测,我们利用基于补丁的对称特征来训练一个结构化随机森林分类器,该分类器可以检测二维图像中的多尺度弯曲对称。接下来,利用这些弯曲的对称性,我们通过它们的对称分数来调制一种新的对称约束的前景-背景分割,以便我们在最终分割中强制全局对称一致性。这是通过施加成对对称先验来实现的,该先验鼓励对称像素在基于mrf的输入图像边缘表示上具有相同的标签,并通过图切割获得最终分割。在包含注释对称结构的四个公开数据集上的实验结果:1)SYMMAX-300 [38], 2) BSD-Parts, 3) Weizmann Horse(均来自[18])和4)new -roads[35],证明了该方法在不同环境中的适用性,具有最先进的性能。
{"title":"Detection and Segmentation of 2D Curved Reflection Symmetric Structures","authors":"C. L. Teo, C. Fermüller, Y. Aloimonos","doi":"10.1109/ICCV.2015.192","DOIUrl":"https://doi.org/10.1109/ICCV.2015.192","url":null,"abstract":"Symmetry, as one of the key components of Gestalt theory, provides an important mid-level cue that serves as input to higher visual processes such as segmentation. In this work, we propose a complete approach that links the detection of curved reflection symmetries to produce symmetry-constrained segments of structures/regions in real images with clutter. For curved reflection symmetry detection, we leverage on patch-based symmetric features to train a Structured Random Forest classifier that detects multiscaled curved symmetries in 2D images. Next, using these curved symmetries, we modulate a novel symmetry-constrained foreground-background segmentation by their symmetry scores so that we enforce global symmetrical consistency in the final segmentation. This is achieved by imposing a pairwise symmetry prior that encourages symmetric pixels to have the same labels over a MRF-based representation of the input image edges, and the final segmentation is obtained via graph-cuts. Experimental results over four publicly available datasets containing annotated symmetric structures: 1) SYMMAX-300 [38], 2) BSD-Parts, 3) Weizmann Horse (both from [18]) and 4) NY-roads [35] demonstrate the approach's applicability to different environments with state-of-the-art performance.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"40 1","pages":"1644-1652"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79887328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
期刊
2015 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1