首页 > 最新文献

2013 IEEE Conference on Computer Vision and Pattern Recognition最新文献

英文 中文
Structure Preserving Object Tracking 保持结构的目标跟踪
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.240
Lu Zhang, L. Maaten
Model-free trackers can track arbitrary objects based on a single (bounding-box) annotation of the object. Whilst the performance of model-free trackers has recently improved significantly, simultaneously tracking multiple objects with similar appearance remains very hard. In this paper, we propose a new multi-object model-free tracker (based on tracking-by-detection) that resolves this problem by incorporating spatial constraints between the objects. The spatial constraints are learned along with the object detectors using an online structured SVM algorithm. The experimental evaluation of our structure-preserving object tracker (SPOT) reveals significant performance improvements in multi-object tracking. We also show that SPOT can improve the performance of single-object trackers by simultaneously tracking different parts of the object.
无模型跟踪器可以基于对象的单个(边界框)注释跟踪任意对象。虽然无模型跟踪器的性能最近有了显着提高,但同时跟踪具有相似外观的多个对象仍然非常困难。在本文中,我们提出了一种新的多目标无模型跟踪器(基于检测跟踪),通过结合目标之间的空间约束来解决这个问题。使用在线结构化SVM算法学习空间约束和目标检测器。我们的结构保持目标跟踪器(SPOT)的实验评估表明,在多目标跟踪方面有显著的性能改进。我们还表明,SPOT可以通过同时跟踪对象的不同部分来提高单目标跟踪器的性能。
{"title":"Structure Preserving Object Tracking","authors":"Lu Zhang, L. Maaten","doi":"10.1109/CVPR.2013.240","DOIUrl":"https://doi.org/10.1109/CVPR.2013.240","url":null,"abstract":"Model-free trackers can track arbitrary objects based on a single (bounding-box) annotation of the object. Whilst the performance of model-free trackers has recently improved significantly, simultaneously tracking multiple objects with similar appearance remains very hard. In this paper, we propose a new multi-object model-free tracker (based on tracking-by-detection) that resolves this problem by incorporating spatial constraints between the objects. The spatial constraints are learned along with the object detectors using an online structured SVM algorithm. The experimental evaluation of our structure-preserving object tracker (SPOT) reveals significant performance improvements in multi-object tracking. We also show that SPOT can improve the performance of single-object trackers by simultaneously tracking different parts of the object.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"64 1","pages":"1838-1845"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89934958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 196
Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras 结构光RGB-D相机的层深度去噪和补全
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.157
Ju Shen, S. Cheung
The recent popularity of structured-light depth sensors has enabled many new applications from gesture-based user interface to 3D reconstructions. The quality of the depth measurements of these systems, however, is far from perfect. Some depth values can have significant errors, while others can be missing altogether. The uncertainty in depth measurements among these sensors can significantly degrade the performance of any subsequent vision processing. In this paper, we propose a novel probabilistic model to capture various types of uncertainties in the depth measurement process among structured-light systems. The key to our model is the use of depth layers to account for the differences between foreground objects and background scene, the missing depth value phenomenon, and the correlation between color and depth channels. The depth layer labeling is solved as a maximum a-posteriori estimation problem, and a Markov Random Field attuned to the uncertainty in measurements is used to spatially smooth the labeling process. Using the depth-layer labels, we propose a depth correction and completion algorithm that outperforms other techniques in the literature.
最近,结构光深度传感器的流行使得许多新的应用成为可能,从基于手势的用户界面到3D重建。然而,这些系统的深度测量质量远非完美。一些深度值可能有明显的错误,而另一些则可能完全丢失。这些传感器之间深度测量的不确定性会显著降低后续视觉处理的性能。在本文中,我们提出了一种新的概率模型来捕捉结构光系统深度测量过程中的各种不确定性。我们模型的关键是使用深度层来解释前景对象和背景场景之间的差异,缺失深度值现象,以及颜色和深度通道之间的相关性。将深度层标注作为最大后验估计问题来解决,并使用与测量不确定性相适应的马尔可夫随机场在空间上平滑标注过程。利用深度层标签,我们提出了一种优于文献中其他技术的深度校正和补全算法。
{"title":"Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras","authors":"Ju Shen, S. Cheung","doi":"10.1109/CVPR.2013.157","DOIUrl":"https://doi.org/10.1109/CVPR.2013.157","url":null,"abstract":"The recent popularity of structured-light depth sensors has enabled many new applications from gesture-based user interface to 3D reconstructions. The quality of the depth measurements of these systems, however, is far from perfect. Some depth values can have significant errors, while others can be missing altogether. The uncertainty in depth measurements among these sensors can significantly degrade the performance of any subsequent vision processing. In this paper, we propose a novel probabilistic model to capture various types of uncertainties in the depth measurement process among structured-light systems. The key to our model is the use of depth layers to account for the differences between foreground objects and background scene, the missing depth value phenomenon, and the correlation between color and depth channels. The depth layer labeling is solved as a maximum a-posteriori estimation problem, and a Markov Random Field attuned to the uncertainty in measurements is used to spatially smooth the labeling process. Using the depth-layer labels, we propose a depth correction and completion algorithm that outperforms other techniques in the literature.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"1187-1194"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76838641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 139
Spatial Inference Machines 空间推理机
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.384
Roman Shapovalov, D. Vetrov, Pushmeet Kohli
This paper addresses the problem of semantic segmentation of 3D point clouds. We extend the inference machines framework of Ross et al. by adding spatial factors that model mid-range and long-range dependencies inherent in the data. The new model is able to account for semantic spatial context. During training, our method automatically isolates and retains factors modelling spatial dependencies between variables that are relevant for achieving higher prediction accuracy. We evaluate the proposed method by using it to predict 17-category semantic segmentations on sets of stitched Kinect scans. Experimental results show that the spatial dependencies learned by our method significantly improve the accuracy of segmentation. They also show that our method outperforms the existing segmentation technique of Koppula et al.
本文研究了三维点云的语义分割问题。我们通过添加空间因素来扩展Ross等人的推理机框架,这些因素对数据中固有的中期和长期依赖关系进行建模。新模型能够解释语义空间上下文。在训练过程中,我们的方法自动隔离和保留变量之间的空间依赖性建模因素,这些因素与实现更高的预测精度有关。我们通过使用它来预测拼接的Kinect扫描集上的17类语义分割来评估所提出的方法。实验结果表明,该方法学习到的空间依赖关系显著提高了分割的精度。他们还表明,我们的方法优于Koppula等人现有的分割技术。
{"title":"Spatial Inference Machines","authors":"Roman Shapovalov, D. Vetrov, Pushmeet Kohli","doi":"10.1109/CVPR.2013.384","DOIUrl":"https://doi.org/10.1109/CVPR.2013.384","url":null,"abstract":"This paper addresses the problem of semantic segmentation of 3D point clouds. We extend the inference machines framework of Ross et al. by adding spatial factors that model mid-range and long-range dependencies inherent in the data. The new model is able to account for semantic spatial context. During training, our method automatically isolates and retains factors modelling spatial dependencies between variables that are relevant for achieving higher prediction accuracy. We evaluate the proposed method by using it to predict 17-category semantic segmentations on sets of stitched Kinect scans. Experimental results show that the spatial dependencies learned by our method significantly improve the accuracy of segmentation. They also show that our method outperforms the existing segmentation technique of Koppula et al.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"85 1","pages":"2985-2992"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77020217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Capturing Layers in Image Collections with Componential Models: From the Layered Epitome to the Componential Counting Grid 用组件模型捕获图像集合中的图层:从分层缩影到组件计数网格
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.71
A. Perina, N. Jojic
Recently, the Counting Grid (CG) model was developed to represent each input image as a point in a large grid of feature counts. This latent point is a corner of a window of grid points which are all uniformly combined to match the (normalized) feature counts in the image. Being a bag of word model with spatial layout in the latent space, the CG model has superior handling of field of view changes in comparison to other bag of word models, but with the price of being essentially a mixture, mapping each scene to a single window in the grid. In this paper we introduce a family of componential models, dubbed the Componential Counting Grid, whose members represent each input image by multiple latent locations, rather than just one. In this way, we make a substantially more flexible admixture model which captures layers or parts of images and maps them to separate windows in a Counting Grid. We tested the models on scene and place classification where their componential nature helped to extract objects, to capture parallax effects, thus better fitting the data and outperforming Counting Grids and Latent Dirichlet Allocation, especially on sequences taken with wearable cameras.
最近,计数网格(CG)模型被开发出来,将每个输入图像表示为特征计数大网格中的一个点。这个潜在点是网格点窗口的一个角落,这些网格点都是均匀地组合在一起以匹配图像中的(归一化的)特征计数。作为一个在潜在空间中有空间布局的词包模型,CG模型对视场变化的处理优于其他词包模型,但其代价是本质上是一个混合物,将每个场景映射到网格中的单个窗口。在本文中,我们引入了一组组件模型,称为组件计数网格,其成员通过多个潜在位置表示每个输入图像,而不仅仅是一个。通过这种方式,我们制作了一个更加灵活的混合模型,它可以捕获图像的图层或部分,并将它们映射到计数网格中的单独窗口。我们在场景和地点分类中测试了这些模型,其中它们的组件性质有助于提取物体,捕捉视差效果,从而更好地拟合数据,优于计数网格和潜在狄利克雷分配,特别是在使用可穿戴相机拍摄的序列上。
{"title":"Capturing Layers in Image Collections with Componential Models: From the Layered Epitome to the Componential Counting Grid","authors":"A. Perina, N. Jojic","doi":"10.1109/CVPR.2013.71","DOIUrl":"https://doi.org/10.1109/CVPR.2013.71","url":null,"abstract":"Recently, the Counting Grid (CG) model was developed to represent each input image as a point in a large grid of feature counts. This latent point is a corner of a window of grid points which are all uniformly combined to match the (normalized) feature counts in the image. Being a bag of word model with spatial layout in the latent space, the CG model has superior handling of field of view changes in comparison to other bag of word models, but with the price of being essentially a mixture, mapping each scene to a single window in the grid. In this paper we introduce a family of componential models, dubbed the Componential Counting Grid, whose members represent each input image by multiple latent locations, rather than just one. In this way, we make a substantially more flexible admixture model which captures layers or parts of images and maps them to separate windows in a Counting Grid. We tested the models on scene and place classification where their componential nature helped to extract objects, to capture parallax effects, thus better fitting the data and outperforming Counting Grids and Latent Dirichlet Allocation, especially on sequences taken with wearable cameras.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"72 8 1","pages":"500-507"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78055576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deep Learning Shape Priors for Object Segmentation 用于对象分割的深度学习形状先验
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.244
Fei Chen, Huimin Yu, Roland Hu, Xunxun Zeng
In this paper we introduce a new shape-driven approach for object segmentation. Given a training set of shapes, we first use deep Boltzmann machine to learn the hierarchical architecture of shape priors. This learned hierarchical architecture is then used to model shape variations of global and local structures in an energetic form. Finally, it is applied to data-driven variational methods to perform object extraction of corrupted data based on shape probabilistic representation. Experiments demonstrate that our model can be applied to dataset of arbitrary prior shapes, and can cope with image noise and clutter, as well as partial occlusions.
本文提出了一种新的形状驱动的目标分割方法。给定一个形状训练集,我们首先使用深度玻尔兹曼机学习形状先验的层次结构。然后使用这种学习到的分层结构以能量形式对全局和局部结构的形状变化进行建模。最后,将其应用于数据驱动的变分方法中,基于形状概率表示对损坏数据进行对象提取。实验表明,该模型可以应用于任意先验形状的数据集,并且可以处理图像噪声和杂波以及部分遮挡。
{"title":"Deep Learning Shape Priors for Object Segmentation","authors":"Fei Chen, Huimin Yu, Roland Hu, Xunxun Zeng","doi":"10.1109/CVPR.2013.244","DOIUrl":"https://doi.org/10.1109/CVPR.2013.244","url":null,"abstract":"In this paper we introduce a new shape-driven approach for object segmentation. Given a training set of shapes, we first use deep Boltzmann machine to learn the hierarchical architecture of shape priors. This learned hierarchical architecture is then used to model shape variations of global and local structures in an energetic form. Finally, it is applied to data-driven variational methods to perform object extraction of corrupted data based on shape probabilistic representation. Experiments demonstrate that our model can be applied to dataset of arbitrary prior shapes, and can cope with image noise and clutter, as well as partial occlusions.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"125 1","pages":"1870-1877"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78471559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 81
Unsupervised Joint Object Discovery and Segmentation in Internet Images 互联网图像中的无监督联合目标发现与分割
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.253
Michael Rubinstein, Armand Joulin, J. Kopf, Ce Liu
We present a new unsupervised algorithm to discover and segment out common objects from large and diverse image collections. In contrast to previous co-segmentation methods, our algorithm performs well even in the presence of significant amounts of noise images (images not containing a common object), as typical for datasets collected from Internet search. The key insight to our algorithm is that common object patterns should be salient within each image, while being sparse with respect to smooth transformations across other images. We propose to use dense correspondences between images to capture the sparsity and visual variability of the common object over the entire database, which enables us to ignore noise objects that may be salient within their own images but do not commonly occur in others. We performed extensive numerical evaluation on established co-segmentation datasets, as well as several new datasets generated using Internet search. Our approach is able to effectively segment out the common object for diverse object categories, while naturally identifying images where the common object is not present.
我们提出了一种新的无监督算法来从大量不同的图像集合中发现和分割出共同的对象。与以前的共同分割方法相比,我们的算法即使在存在大量噪声图像(不包含共同对象的图像)的情况下也表现良好,这是典型的从互联网搜索收集的数据集。我们算法的关键见解是,公共对象模式应该在每个图像中突出,而相对于其他图像的平滑转换是稀疏的。我们建议使用图像之间的密集对应关系来捕获整个数据库中常见对象的稀疏性和视觉可变性,这使我们能够忽略噪声对象,这些噪声对象可能在它们自己的图像中很突出,但在其他图像中并不常见。我们对已建立的共分割数据集以及使用互联网搜索生成的几个新数据集进行了广泛的数值评估。我们的方法能够有效地分割出不同对象类别的共同对象,同时自然地识别出不存在共同对象的图像。
{"title":"Unsupervised Joint Object Discovery and Segmentation in Internet Images","authors":"Michael Rubinstein, Armand Joulin, J. Kopf, Ce Liu","doi":"10.1109/CVPR.2013.253","DOIUrl":"https://doi.org/10.1109/CVPR.2013.253","url":null,"abstract":"We present a new unsupervised algorithm to discover and segment out common objects from large and diverse image collections. In contrast to previous co-segmentation methods, our algorithm performs well even in the presence of significant amounts of noise images (images not containing a common object), as typical for datasets collected from Internet search. The key insight to our algorithm is that common object patterns should be salient within each image, while being sparse with respect to smooth transformations across other images. We propose to use dense correspondences between images to capture the sparsity and visual variability of the common object over the entire database, which enables us to ignore noise objects that may be salient within their own images but do not commonly occur in others. We performed extensive numerical evaluation on established co-segmentation datasets, as well as several new datasets generated using Internet search. Our approach is able to effectively segment out the common object for diverse object categories, while naturally identifying images where the common object is not present.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"69 1","pages":"1939-1946"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76064512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 370
Online Dominant and Anomalous Behavior Detection in Videos 视频中的在线支配和异常行为检测
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.337
M. J. Roshtkhari, M. Levine
We present a novel approach for video parsing and simultaneous online learning of dominant and anomalous behaviors in surveillance videos. Dominant behaviors are those occurring frequently in videos and hence, usually do not attract much attention. They can be characterized by different complexities in space and time, ranging from a scene background to human activities. In contrast, an anomalous behavior is defined as having a low likelihood of occurrence. We do not employ any models of the entities in the scene in order to detect these two kinds of behaviors. In this paper, video events are learnt at each pixel without supervision using densely constructed spatio-temporal video volumes. Furthermore, the volumes are organized into large contextual graphs. These compositions are employed to construct a hierarchical codebook model for the dominant behaviors. By decomposing spatio-temporal contextual information into unique spatial and temporal contexts, the proposed framework learns the models of the dominant spatial and temporal events. Thus, it is ultimately capable of simultaneously modeling high-level behaviors as well as low-level spatial, temporal and spatio-temporal pixel level changes.
我们提出了一种新的视频分析方法,并同时在线学习监控视频中的主导和异常行为。主导行为是那些在视频中频繁出现的行为,因此通常不会引起太多关注。从场景背景到人类活动,它们在空间和时间上具有不同的复杂性。相反,异常行为被定义为发生的可能性很低。我们没有使用场景中实体的任何模型来检测这两种行为。在本文中,视频事件在没有监督的情况下在每个像素上学习,使用密集构建的时空视频卷。此外,这些体量被组织成大型的上下文图。这些组合被用来构建一个主导行为的分层码本模型。该框架通过将时空语境信息分解为独特的时空语境,学习主导时空事件的模型。因此,它最终能够同时模拟高级行为和低级空间、时间和时空像素级的变化。
{"title":"Online Dominant and Anomalous Behavior Detection in Videos","authors":"M. J. Roshtkhari, M. Levine","doi":"10.1109/CVPR.2013.337","DOIUrl":"https://doi.org/10.1109/CVPR.2013.337","url":null,"abstract":"We present a novel approach for video parsing and simultaneous online learning of dominant and anomalous behaviors in surveillance videos. Dominant behaviors are those occurring frequently in videos and hence, usually do not attract much attention. They can be characterized by different complexities in space and time, ranging from a scene background to human activities. In contrast, an anomalous behavior is defined as having a low likelihood of occurrence. We do not employ any models of the entities in the scene in order to detect these two kinds of behaviors. In this paper, video events are learnt at each pixel without supervision using densely constructed spatio-temporal video volumes. Furthermore, the volumes are organized into large contextual graphs. These compositions are employed to construct a hierarchical codebook model for the dominant behaviors. By decomposing spatio-temporal contextual information into unique spatial and temporal contexts, the proposed framework learns the models of the dominant spatial and temporal events. Thus, it is ultimately capable of simultaneously modeling high-level behaviors as well as low-level spatial, temporal and spatio-temporal pixel level changes.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"488 1","pages":"2611-2618"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76108053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 168
Expanded Parts Model for Human Attribute and Action Recognition in Still Images 静态图像中人体属性和动作识别的扩展部件模型
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.90
Gaurav Sharma, F. Jurie, C. Schmid
We propose a new model for recognizing human attributes (e.g. wearing a suit, sitting, short hair) and actions (e.g. running, riding a horse) in still images. The proposed model relies on a collection of part templates which are learnt discriminatively to explain specific scale-space locations in the images (in human centric coordinates). It avoids the limitations of highly structured models, which consist of a few (i.e. a mixture of) 'average' templates. To learn our model, we propose an algorithm which automatically mines out parts and learns corresponding discriminative templates with their respective locations from a large number of candidate parts. We validate the method on recent challenging datasets: (i) Willow 7 actions [7], (ii) 27 Human Attributes (HAT) [25], and (iii) Stanford 40 actions [37]. We obtain convincing qualitative and state-of-the-art quantitative results on the three datasets.
我们提出了一个新的模型来识别静止图像中的人类属性(如穿西装、坐着、短发)和动作(如跑步、骑马)。提出的模型依赖于部分模板的集合,这些模板被区分地学习来解释图像中特定的尺度空间位置(在以人为中心的坐标中)。它避免了高度结构化模型的限制,这些模型由几个(即混合)组成。“平均”模板。为了学习我们的模型,我们提出了一种自动挖掘零件的算法,并从大量的候选零件中学习相应的具有各自位置的判别模板。我们在最近的具有挑战性的数据集上验证了该方法:(i) Willow 7个动作[7],(ii) 27个人类属性(HAT)[25],以及(iii) Stanford 40个动作[37]。我们在三个数据集上获得了令人信服的定性和最先进的定量结果。
{"title":"Expanded Parts Model for Human Attribute and Action Recognition in Still Images","authors":"Gaurav Sharma, F. Jurie, C. Schmid","doi":"10.1109/CVPR.2013.90","DOIUrl":"https://doi.org/10.1109/CVPR.2013.90","url":null,"abstract":"We propose a new model for recognizing human attributes (e.g. wearing a suit, sitting, short hair) and actions (e.g. running, riding a horse) in still images. The proposed model relies on a collection of part templates which are learnt discriminatively to explain specific scale-space locations in the images (in human centric coordinates). It avoids the limitations of highly structured models, which consist of a few (i.e. a mixture of) 'average' templates. To learn our model, we propose an algorithm which automatically mines out parts and learns corresponding discriminative templates with their respective locations from a large number of candidate parts. We validate the method on recent challenging datasets: (i) Willow 7 actions [7], (ii) 27 Human Attributes (HAT) [25], and (iii) Stanford 40 actions [37]. We obtain convincing qualitative and state-of-the-art quantitative results on the three datasets.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"18 1","pages":"652-659"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75713129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 104
A New Model and Simple Algorithms for Multi-label Mumford-Shah Problems 多标签Mumford-Shah问题的一个新模型和简单算法
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.161
Byung-Woo Hong, Zhaojin Lu, G. Sundaramoorthi
In this work, we address the multi-label Mumford-Shah problem, i.e., the problem of jointly estimating a partitioning of the domain of the image, and functions defined within regions of the partition. We create algorithms that are efficient, robust to undesirable local minima, and are easy-to-implement. Our algorithms are formulated by slightly modifying the underlying statistical model from which the multi-label Mumford-Shah functional is derived. The advantage of this statistical model is that the underlying variables: the labels and the functions are less coupled than in the original formulation, and the labels can be computed from the functions with more global updates. The resulting algorithms can be tuned to the desired level of locality of the solution: from fully global updates to more local updates. We demonstrate our algorithm on two applications: joint multi-label segmentation and denoising, and joint multi-label motion segmentation and flow estimation. We compare to the state-of-the-art in multi-label Mumford-Shah problems and show that we achieve more promising results.
在这项工作中,我们解决了多标签Mumford-Shah问题,即联合估计图像域的分区问题,以及在分区区域内定义的函数。我们创建的算法是高效的,对不希望的局部最小值具有鲁棒性,并且易于实现。我们的算法是通过稍微修改底层统计模型来制定的,从这个模型中推导出了多标签Mumford-Shah函数。这个统计模型的优点是底层变量:标签和函数的耦合比原始公式少,并且标签可以从具有更多全局更新的函数中计算出来。得到的算法可以调优到解决方案所需的局部性级别:从完全全局更新到更多的局部更新。我们在联合多标签分割和去噪、联合多标签运动分割和流量估计两个方面展示了我们的算法。我们比较了最先进的多标签Mumford-Shah问题,并表明我们取得了更有希望的结果。
{"title":"A New Model and Simple Algorithms for Multi-label Mumford-Shah Problems","authors":"Byung-Woo Hong, Zhaojin Lu, G. Sundaramoorthi","doi":"10.1109/CVPR.2013.161","DOIUrl":"https://doi.org/10.1109/CVPR.2013.161","url":null,"abstract":"In this work, we address the multi-label Mumford-Shah problem, i.e., the problem of jointly estimating a partitioning of the domain of the image, and functions defined within regions of the partition. We create algorithms that are efficient, robust to undesirable local minima, and are easy-to-implement. Our algorithms are formulated by slightly modifying the underlying statistical model from which the multi-label Mumford-Shah functional is derived. The advantage of this statistical model is that the underlying variables: the labels and the functions are less coupled than in the original formulation, and the labels can be computed from the functions with more global updates. The resulting algorithms can be tuned to the desired level of locality of the solution: from fully global updates to more local updates. We demonstrate our algorithm on two applications: joint multi-label segmentation and denoising, and joint multi-label motion segmentation and flow estimation. We compare to the state-of-the-art in multi-label Mumford-Shah problems and show that we achieve more promising results.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"8 1","pages":"1219-1226"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72977621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Exemplar-Based Face Parsing 基于范例的人脸解析
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.447
Brandon M. Smith, Li Zhang, Jonathan Brandt, Zhe L. Lin, Jianchao Yang
In this work, we propose an exemplar-based face image segmentation algorithm. We take inspiration from previous works on image parsing for general scenes. Our approach assumes a database of exemplar face images, each of which is associated with a hand-labeled segmentation map. Given a test image, our algorithm first selects a subset of exemplar images from the database, Our algorithm then computes a nonrigid warp for each exemplar image to align it with the test image. Finally, we propagate labels from the exemplar images to the test image in a pixel-wise manner, using trained weights to modulate and combine label maps from different exemplars. We evaluate our method on two challenging datasets and compare with two face parsing algorithms and a general scene parsing algorithm. We also compare our segmentation results with contour-based face alignment results, that is, we first run the alignment algorithms to extract contour points and then derive segments from the contours. Our algorithm compares favorably with all previous works on all datasets evaluated.
在这项工作中,我们提出了一种基于样本的人脸图像分割算法。我们从之前的一般场景图像解析工作中获得灵感。我们的方法假设了一个样本人脸图像的数据库,每个样本都与一个手工标记的分割图相关联。给定一个测试图像,我们的算法首先从数据库中选择一个样本图像子集,然后我们的算法为每个样本图像计算一个非刚性翘曲,使其与测试图像对齐。最后,我们以逐像素的方式将标签从示例图像传播到测试图像,使用训练过的权重来调制和组合来自不同示例的标签映射。我们在两个具有挑战性的数据集上评估了我们的方法,并与两种人脸解析算法和一种通用场景解析算法进行了比较。我们还将我们的分割结果与基于轮廓的人脸对齐结果进行了比较,即我们首先运行对齐算法提取轮廓点,然后从轮廓中提取片段。我们的算法在所有评估过的数据集上都优于之前的所有工作。
{"title":"Exemplar-Based Face Parsing","authors":"Brandon M. Smith, Li Zhang, Jonathan Brandt, Zhe L. Lin, Jianchao Yang","doi":"10.1109/CVPR.2013.447","DOIUrl":"https://doi.org/10.1109/CVPR.2013.447","url":null,"abstract":"In this work, we propose an exemplar-based face image segmentation algorithm. We take inspiration from previous works on image parsing for general scenes. Our approach assumes a database of exemplar face images, each of which is associated with a hand-labeled segmentation map. Given a test image, our algorithm first selects a subset of exemplar images from the database, Our algorithm then computes a nonrigid warp for each exemplar image to align it with the test image. Finally, we propagate labels from the exemplar images to the test image in a pixel-wise manner, using trained weights to modulate and combine label maps from different exemplars. We evaluate our method on two challenging datasets and compare with two face parsing algorithms and a general scene parsing algorithm. We also compare our segmentation results with contour-based face alignment results, that is, we first run the alignment algorithms to extract contour points and then derive segments from the contours. Our algorithm compares favorably with all previous works on all datasets evaluated.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"85 1","pages":"3484-3491"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73156453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 141
期刊
2013 IEEE Conference on Computer Vision and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1