首页 > 最新文献

2013 IEEE Conference on Computer Vision and Pattern Recognition最新文献

英文 中文
Action Recognition by Hierarchical Sequence Summarization 基于层次序列摘要的动作识别
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.457
Yale Song, Louis-Philippe Morency, Randall Davis
Recent progress has shown that learning from hierarchical feature representations leads to improvements in various computer vision tasks. Motivated by the observation that human activity data contains information at various temporal resolutions, we present a hierarchical sequence summarization approach for action recognition that learns multiple layers of discriminative feature representations at different temporal granularities. We build up a hierarchy dynamically and recursively by alternating sequence learning and sequence summarization. For sequence learning we use CRFs with latent variables to learn hidden spatio-temporal dynamics, for sequence summarization we group observations that have similar semantic meaning in the latent space. For each layer we learn an abstract feature representation through non-linear gate functions. This procedure is repeated to obtain a hierarchical sequence summary representation. We develop an efficient learning method to train our model and show that its complexity grows sub linearly with the size of the hierarchy. Experimental results show the effectiveness of our approach, achieving the best published results on the Arm Gesture and Canal9 datasets.
最近的进展表明,从分层特征表示中学习可以改善各种计算机视觉任务。观察到人类活动数据包含不同时间分辨率的信息,我们提出了一种用于动作识别的分层序列摘要方法,该方法可以学习不同时间粒度的多层判别特征表示。通过交替的序列学习和序列总结,动态递归地建立了一个层次结构。对于序列学习,我们使用带有潜在变量的crf来学习隐藏的时空动态,对于序列总结,我们将潜在空间中具有相似语义的观测值分组。对于每一层,我们通过非线性门函数学习一个抽象的特征表示。重复此过程以获得分层序列摘要表示。我们开发了一种有效的学习方法来训练我们的模型,并表明其复杂性随着层次结构的大小呈亚线性增长。实验结果表明了该方法的有效性,在Arm Gesture和Canal9数据集上取得了已发表的最佳结果。
{"title":"Action Recognition by Hierarchical Sequence Summarization","authors":"Yale Song, Louis-Philippe Morency, Randall Davis","doi":"10.1109/CVPR.2013.457","DOIUrl":"https://doi.org/10.1109/CVPR.2013.457","url":null,"abstract":"Recent progress has shown that learning from hierarchical feature representations leads to improvements in various computer vision tasks. Motivated by the observation that human activity data contains information at various temporal resolutions, we present a hierarchical sequence summarization approach for action recognition that learns multiple layers of discriminative feature representations at different temporal granularities. We build up a hierarchy dynamically and recursively by alternating sequence learning and sequence summarization. For sequence learning we use CRFs with latent variables to learn hidden spatio-temporal dynamics, for sequence summarization we group observations that have similar semantic meaning in the latent space. For each layer we learn an abstract feature representation through non-linear gate functions. This procedure is repeated to obtain a hierarchical sequence summary representation. We develop an efficient learning method to train our model and show that its complexity grows sub linearly with the size of the hierarchy. Experimental results show the effectiveness of our approach, achieving the best published results on the Arm Gesture and Canal9 datasets.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79482796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 101
Relative Volume Constraints for Single View 3D Reconstruction 单视图三维重建的相对体积约束
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.30
Eno Töppe, C. Nieuwenhuis, D. Cremers
We introduce the concept of relative volume constraints in order to account for insufficient information in the reconstruction of 3D objects from a single image. The key idea is to formulate a variational reconstruction approach with shape priors in form of relative depth profiles or volume ratios relating object parts. Such shape priors can easily be derived either from a user sketch or from the object's shading profile in the image. They can handle textured or shadowed object regions by propagating information. We propose a convex relaxation of the constrained optimization problem which can be solved optimally in a few seconds on graphics hardware. In contrast to existing single view reconstruction algorithms, the proposed algorithm provides substantially more flexibility to recover shape details such as self-occlusions, dents and holes, which are not visible in the object silhouette.
我们引入了相对体积约束的概念,以解释从单个图像重建3D物体时信息不足的问题。关键思想是制定一个变分重建方法与形状先验的形式相对深度轮廓或体积比相关的物体部分。这样的形状先验可以很容易地从用户草图或从物体的阴影轮廓在图像中得到。它们可以通过传播信息来处理纹理或阴影对象区域。我们提出了一种约束优化问题的凸松弛方法,在图形硬件上可以在几秒钟内得到最优解。与现有的单视图重建算法相比,该算法在恢复物体轮廓中不可见的形状细节(如自遮挡、凹痕和孔洞)方面提供了更大的灵活性。
{"title":"Relative Volume Constraints for Single View 3D Reconstruction","authors":"Eno Töppe, C. Nieuwenhuis, D. Cremers","doi":"10.1109/CVPR.2013.30","DOIUrl":"https://doi.org/10.1109/CVPR.2013.30","url":null,"abstract":"We introduce the concept of relative volume constraints in order to account for insufficient information in the reconstruction of 3D objects from a single image. The key idea is to formulate a variational reconstruction approach with shape priors in form of relative depth profiles or volume ratios relating object parts. Such shape priors can easily be derived either from a user sketch or from the object's shading profile in the image. They can handle textured or shadowed object regions by propagating information. We propose a convex relaxation of the constrained optimization problem which can be solved optimally in a few seconds on graphics hardware. In contrast to existing single view reconstruction algorithms, the proposed algorithm provides substantially more flexibility to recover shape details such as self-occlusions, dents and holes, which are not visible in the object silhouette.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84549529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Spatiotemporal Deformable Part Models for Action Detection 用于动作检测的时空可变形部件模型
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.341
Yicong Tian, R. Sukthankar, M. Shah
Deformable part models have achieved impressive performance for object detection, even on difficult image datasets. This paper explores the generalization of deformable part models from 2D images to 3D spatiotemporal volumes to better study their effectiveness for action detection in video. Actions are treated as spatiotemporal patterns and a deformable part model is generated for each action from a collection of examples. For each action model, the most discriminative 3D sub volumes are automatically selected as parts and the spatiotemporal relations between their locations are learned. By focusing on the most distinctive parts of each action, our models adapt to intra-class variation and show robustness to clutter. Extensive experiments on several video datasets demonstrate the strength of spatiotemporal DPMs for classifying and localizing actions.
可变形零件模型在目标检测方面取得了令人印象深刻的表现,即使在困难的图像数据集上也是如此。本文探讨了可变形部件模型从二维图像到三维时空体的推广,以更好地研究其在视频动作检测中的有效性。动作被视为时空模式,并从一组示例中为每个动作生成可变形的部分模型。对于每个动作模型,自动选择最具判别性的三维子体作为部件,并学习其位置之间的时空关系。通过关注每个动作最独特的部分,我们的模型适应类内变化,并显示出对杂乱的鲁棒性。在多个视频数据集上的大量实验证明了时空dpm在动作分类和定位方面的优势。
{"title":"Spatiotemporal Deformable Part Models for Action Detection","authors":"Yicong Tian, R. Sukthankar, M. Shah","doi":"10.1109/CVPR.2013.341","DOIUrl":"https://doi.org/10.1109/CVPR.2013.341","url":null,"abstract":"Deformable part models have achieved impressive performance for object detection, even on difficult image datasets. This paper explores the generalization of deformable part models from 2D images to 3D spatiotemporal volumes to better study their effectiveness for action detection in video. Actions are treated as spatiotemporal patterns and a deformable part model is generated for each action from a collection of examples. For each action model, the most discriminative 3D sub volumes are automatically selected as parts and the spatiotemporal relations between their locations are learned. By focusing on the most distinctive parts of each action, our models adapt to intra-class variation and show robustness to clutter. Extensive experiments on several video datasets demonstrate the strength of spatiotemporal DPMs for classifying and localizing actions.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80807786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 268
Fast Convolutional Sparse Coding 快速卷积稀疏编码
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.57
H. Bristow, Anders P. Eriksson, S. Lucey
Sparse coding has become an increasingly popular method in learning and vision for a variety of classification, reconstruction and coding tasks. The canonical approach intrinsically assumes independence between observations during learning. For many natural signals however, sparse coding is applied to sub-elements ( i.e. patches) of the signal, where such an assumption is invalid. Convolutional sparse coding explicitly models local interactions through the convolution operator, however the resulting optimization problem is considerably more complex than traditional sparse coding. In this paper, we draw upon ideas from signal processing and Augmented Lagrange Methods (ALMs) to produce a fast algorithm with globally optimal sub problems and super-linear convergence.
稀疏编码已经成为一种越来越受欢迎的学习和视觉方法,用于各种分类、重构和编码任务。规范方法本质上假定学习过程中观察之间的独立性。然而,对于许多自然信号,稀疏编码应用于信号的子元素(即补丁),其中这种假设是无效的。卷积稀疏编码通过卷积算子显式地对局部相互作用进行建模,但由此产生的优化问题比传统稀疏编码要复杂得多。本文借鉴信号处理和增广拉格朗日方法的思想,提出了一种具有全局最优子问题和超线性收敛的快速算法。
{"title":"Fast Convolutional Sparse Coding","authors":"H. Bristow, Anders P. Eriksson, S. Lucey","doi":"10.1109/CVPR.2013.57","DOIUrl":"https://doi.org/10.1109/CVPR.2013.57","url":null,"abstract":"Sparse coding has become an increasingly popular method in learning and vision for a variety of classification, reconstruction and coding tasks. The canonical approach intrinsically assumes independence between observations during learning. For many natural signals however, sparse coding is applied to sub-elements ( i.e. patches) of the signal, where such an assumption is invalid. Convolutional sparse coding explicitly models local interactions through the convolution operator, however the resulting optimization problem is considerably more complex than traditional sparse coding. In this paper, we draw upon ideas from signal processing and Augmented Lagrange Methods (ALMs) to produce a fast algorithm with globally optimal sub problems and super-linear convergence.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80455468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 331
3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image 3D视觉接近学:从单个图像中识别3D中的人类互动
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.437
Ishani Chakraborty, Hui Cheng, O. Javed
We present a unified framework for detecting and classifying people interactions in unconstrained user generated images. Unlike previous approaches that directly map people/face locations in 2D image space into features for classification, we first estimate camera viewpoint and people positions in 3D space and then extract spatial configuration features from explicit 3D people positions. This approach has several advantages. First, it can accurately estimate relative distances and orientations between people in 3D. Second, it encodes spatial arrangements of people into a richer set of shape descriptors than afforded in 2D. Our 3D shape descriptors are invariant to camera pose variations often seen in web images and videos. The proposed approach also estimates camera pose and uses it to capture the intent of the photo. To achieve accurate 3D people layout estimation, we develop an algorithm that robustly fuses semantic constraints about human interpositions into a linear camera model. This enables our model to handle large variations in people size, heights (e.g. age) and poses. An accurate 3D layout also allows us to construct features informed by Proxemics that improves our semantic classification. To characterize the human interaction space, we introduce visual proxemes, a set of prototypical patterns that represent commonly occurring social interactions in events. We train a discriminative classifier that classifies 3D arrangements of people into visual proxemes and quantitatively evaluate the performance on a large, challenging dataset.
我们提出了一个统一的框架,用于检测和分类无约束用户生成图像中的人员交互。与以往直接将2D图像空间中的人/脸位置映射为特征进行分类的方法不同,我们首先估计3D空间中的相机视点和人的位置,然后从明确的3D人物位置中提取空间配置特征。这种方法有几个优点。首先,它可以准确地估计3D中人与人之间的相对距离和方向。其次,它将人的空间排列编码成一组比2D更丰富的形状描述符。我们的3D形状描述符对于在网络图像和视频中经常看到的相机姿势变化是不变的。该方法还可以估计相机的姿势,并用它来捕捉照片的意图。为了实现准确的三维人物布局估计,我们开发了一种算法,该算法将关于人物插入的语义约束稳健地融合到线性相机模型中。这使我们的模型能够处理人的尺寸、身高(例如年龄)和姿势的巨大变化。精确的3D布局还允许我们构建由Proxemics通知的特征,从而改进我们的语义分类。为了描述人类互动空间的特征,我们引入了视觉特征,这是一组代表事件中常见的社会互动的原型模式。我们训练了一个判别分类器,该分类器将人的3D排列分类为视觉对象,并在一个大型的、具有挑战性的数据集上定量评估其性能。
{"title":"3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image","authors":"Ishani Chakraborty, Hui Cheng, O. Javed","doi":"10.1109/CVPR.2013.437","DOIUrl":"https://doi.org/10.1109/CVPR.2013.437","url":null,"abstract":"We present a unified framework for detecting and classifying people interactions in unconstrained user generated images. Unlike previous approaches that directly map people/face locations in 2D image space into features for classification, we first estimate camera viewpoint and people positions in 3D space and then extract spatial configuration features from explicit 3D people positions. This approach has several advantages. First, it can accurately estimate relative distances and orientations between people in 3D. Second, it encodes spatial arrangements of people into a richer set of shape descriptors than afforded in 2D. Our 3D shape descriptors are invariant to camera pose variations often seen in web images and videos. The proposed approach also estimates camera pose and uses it to capture the intent of the photo. To achieve accurate 3D people layout estimation, we develop an algorithm that robustly fuses semantic constraints about human interpositions into a linear camera model. This enables our model to handle large variations in people size, heights (e.g. age) and poses. An accurate 3D layout also allows us to construct features informed by Proxemics that improves our semantic classification. To characterize the human interaction space, we introduce visual proxemes, a set of prototypical patterns that represent commonly occurring social interactions in events. We train a discriminative classifier that classifies 3D arrangements of people into visual proxemes and quantitatively evaluate the performance on a large, challenging dataset.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83336180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds 体素云连接分割-点云的超体素
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.264
Jeremie Papon, A. Abramov, Markus Schoeler, F. Wörgötter
Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as super pixels, is a widely used preprocessing step in segmentation algorithms. Super pixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that super pixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent super pixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.
将图像无监督地分割成感知相似像素的区域,称为超级像素,是分割算法中广泛使用的预处理步骤。超级像素方法减少了稍后必须由计算成本更高的算法考虑的区域数量,并且信息损失最小。然而,由于一些信息不可避免地会丢失,所以超级像素不能跨越对象边界是至关重要的,因为这样的错误将在后面的步骤中传播。现有的方法利用投影的颜色或深度信息,但没有考虑观测数据点之间的三维几何关系,这可以用来防止超像素穿过空白区域。我们提出了一种新的过度分割算法,该算法使用体素关系来产生与三维空间中场景的空间几何完全一致的过度分割,而不是投影空间。强制分割区域必须具有空间连通性的约束可以防止标签流跨越语义对象边界,否则可能会违反语义对象边界。此外,由于该算法直接在3D空间中工作,因此可以对多个校准后的RGB+D相机的观测结果进行联合分割。在人类注释的RGB+D图像的大型数据集上的实验表明,在保持与最先进的2D方法相当的速度的同时,跨越对象边界的簇的发生显著减少。
{"title":"Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds","authors":"Jeremie Papon, A. Abramov, Markus Schoeler, F. Wörgötter","doi":"10.1109/CVPR.2013.264","DOIUrl":"https://doi.org/10.1109/CVPR.2013.264","url":null,"abstract":"Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as super pixels, is a widely used preprocessing step in segmentation algorithms. Super pixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that super pixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent super pixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81372882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 497
Winding Number for Region-Boundary Consistent Salient Contour Extraction 区域边界一致凸轮廓提取的绕组数
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.363
Y. Ming, Hongdong Li, Xuming He
This paper aims to extract salient closed contours froman image. For this vision task, both region segmentation cues (e.g. color/texture homogeneity) and boundary detection cues (e.g. local contrast, edge continuity and contour closure) play important and complementary roles. In this paper we show how to combine both cues in a unified framework. The main focus is given to how to maintain the consistency (compatibility) between the region cues and the boundary cues. To this ends, we introduce the use of winding number-a well-known concept in topology-as a powerful mathematical device. By this device, the region-boundary consistency is represented as aset of simple linear relationships. Our method is applied to the figure-ground segmentation problem. The experiments show clearly improved results.
本文的目的是从图像中提取显著的闭合轮廓。对于该视觉任务,区域分割线索(如颜色/纹理均匀性)和边界检测线索(如局部对比度、边缘连续性和轮廓闭合)都起着重要的互补作用。在本文中,我们展示了如何在一个统一的框架中结合这两种线索。重点是如何保持区域线索和边界线索之间的一致性(兼容性)。为此,我们介绍了圈数的使用——一个在拓扑学中众所周知的概念——作为一个强大的数学工具。通过这种方法,区域边界一致性被表示为一组简单的线性关系。该方法已应用于图像-背景分割问题。实验结果明显改善。
{"title":"Winding Number for Region-Boundary Consistent Salient Contour Extraction","authors":"Y. Ming, Hongdong Li, Xuming He","doi":"10.1109/CVPR.2013.363","DOIUrl":"https://doi.org/10.1109/CVPR.2013.363","url":null,"abstract":"This paper aims to extract salient closed contours froman image. For this vision task, both region segmentation cues (e.g. color/texture homogeneity) and boundary detection cues (e.g. local contrast, edge continuity and contour closure) play important and complementary roles. In this paper we show how to combine both cues in a unified framework. The main focus is given to how to maintain the consistency (compatibility) between the region cues and the boundary cues. To this ends, we introduce the use of winding number-a well-known concept in topology-as a powerful mathematical device. By this device, the region-boundary consistency is represented as aset of simple linear relationships. Our method is applied to the figure-ground segmentation problem. The experiments show clearly improved results.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81583349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Expressive Visual Text-to-Speech Using Active Appearance Models 使用主动外观模型表达视觉文本到语音
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.434
Robert Anderson, B. Stenger, V. Wan, R. Cipolla
This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.
本文提出了一个完整的视觉文本到语音(VTTS)系统,该系统能够在给定输入文本和一组连续表达权重的情况下,以“说话的头”的形式产生富有表现力的输出。采用主动外观模型(AAM)对人脸进行建模,并对其进行了扩展,使其更适用于VTTS任务。该模型允许对姿态和眨眼状态进行归一化,从而显著减少合成序列中的伪影。我们展示了在超过一百万帧的重建误差方面的定量改进,以及在大规模用户研究中,比较不同系统的输出。
{"title":"Expressive Visual Text-to-Speech Using Active Appearance Models","authors":"Robert Anderson, B. Stenger, V. Wan, R. Cipolla","doi":"10.1109/CVPR.2013.434","DOIUrl":"https://doi.org/10.1109/CVPR.2013.434","url":null,"abstract":"This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78858877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 84
Three-Dimensional Bilateral Symmetry Plane Estimation in the Phase Domain 相位域三维双边对称平面估计
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.39
R. Kakarala, P. Kaliamoorthi, Vittal Premachandran
We show that bilateral symmetry plane estimation for three-dimensional (3-D) shapes may be carried out accurately, and efficiently, in the spherical harmonic domain. Our methods are valuable for applications where spherical harmonic expansion is already employed, such as 3-D shape registration, morphometry, and retrieval. We show that the presence of bilateral symmetry in the 3-D shape is equivalent to a linear phase structure in the corresponding spherical harmonic coefficients, and provide algorithms for estimating the orientation of the symmetry plane. The benefit of using spherical harmonic phase is that symmetry estimation reduces to matching a compact set of descriptors, without the need to solve a correspondence problem. Our methods work on point clouds as well as large-scale mesh models of 3-D shapes.
我们证明了三维(3-D)形状的双边对称平面估计可以在球面调和域中精确有效地进行。我们的方法对于已经采用球面谐波展开的应用很有价值,例如三维形状配准,形态测量和检索。我们证明了三维形状中双边对称的存在等效于相应球面谐波系数中的线性相位结构,并提供了估计对称平面方向的算法。使用球谐相位的好处是对称估计简化为匹配一组紧凑的描述符,而不需要解决对应问题。我们的方法既适用于点云,也适用于三维形状的大规模网格模型。
{"title":"Three-Dimensional Bilateral Symmetry Plane Estimation in the Phase Domain","authors":"R. Kakarala, P. Kaliamoorthi, Vittal Premachandran","doi":"10.1109/CVPR.2013.39","DOIUrl":"https://doi.org/10.1109/CVPR.2013.39","url":null,"abstract":"We show that bilateral symmetry plane estimation for three-dimensional (3-D) shapes may be carried out accurately, and efficiently, in the spherical harmonic domain. Our methods are valuable for applications where spherical harmonic expansion is already employed, such as 3-D shape registration, morphometry, and retrieval. We show that the presence of bilateral symmetry in the 3-D shape is equivalent to a linear phase structure in the corresponding spherical harmonic coefficients, and provide algorithms for estimating the orientation of the symmetry plane. The benefit of using spherical harmonic phase is that symmetry estimation reduces to matching a compact set of descriptors, without the need to solve a correspondence problem. Our methods work on point clouds as well as large-scale mesh models of 3-D shapes.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82818493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues 结合密集与稀疏视觉线索的基于实时模型的刚体姿态估计与跟踪
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.304
Karl Pauwels, Leonardo Rubio, Javier Díaz, E. Ros
We propose a novel model-based method for estimating and tracking the six-degrees-of-freedom (6DOF) pose of rigid objects of arbitrary shapes in real-time. By combining dense motion and stereo cues with sparse key point correspondences, and by feeding back information from the model to the cue extraction level, the method is both highly accurate and robust to noise and occlusions. A tight integration of the graphical and computational capability of Graphics Processing Units (GPUs) results in pose updates at frame rates exceeding 60 Hz. Since a benchmark dataset that enables the evaluation of stereo-vision-based pose estimators in complex scenarios is currently missing in the literature, we have introduced a novel synthetic benchmark dataset with varying objects, background motion, noise and occlusions. Using this dataset and a novel evaluation methodology, we show that the proposed method greatly outperforms state-of-the-art methods. Finally, we demonstrate excellent performance on challenging real-world sequences involving object manipulation.
提出了一种新的基于模型的任意形状刚体六自由度(6DOF)姿态实时估计和跟踪方法。该方法将密集运动和立体线索与稀疏的关键点对应相结合,并将模型信息反馈到线索提取层,具有较高的准确性和对噪声和遮挡的鲁棒性。图形处理单元(gpu)的图形和计算能力的紧密集成导致帧率超过60hz的姿态更新。由于目前文献中缺乏能够在复杂场景中评估基于立体视觉的姿态估计器的基准数据集,因此我们引入了一个具有不同对象、背景运动、噪声和遮挡的新型合成基准数据集。使用该数据集和一种新的评估方法,我们表明所提出的方法大大优于最先进的方法。最后,我们在涉及对象操作的具有挑战性的现实世界序列上展示了出色的性能。
{"title":"Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues","authors":"Karl Pauwels, Leonardo Rubio, Javier Díaz, E. Ros","doi":"10.1109/CVPR.2013.304","DOIUrl":"https://doi.org/10.1109/CVPR.2013.304","url":null,"abstract":"We propose a novel model-based method for estimating and tracking the six-degrees-of-freedom (6DOF) pose of rigid objects of arbitrary shapes in real-time. By combining dense motion and stereo cues with sparse key point correspondences, and by feeding back information from the model to the cue extraction level, the method is both highly accurate and robust to noise and occlusions. A tight integration of the graphical and computational capability of Graphics Processing Units (GPUs) results in pose updates at frame rates exceeding 60 Hz. Since a benchmark dataset that enables the evaluation of stereo-vision-based pose estimators in complex scenarios is currently missing in the literature, we have introduced a novel synthetic benchmark dataset with varying objects, background motion, noise and occlusions. Using this dataset and a novel evaluation methodology, we show that the proposed method greatly outperforms state-of-the-art methods. Finally, we demonstrate excellent performance on challenging real-world sequences involving object manipulation.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90076835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
期刊
2013 IEEE Conference on Computer Vision and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1