2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文中文

Similarity Learning with Spatial Constraints for Person Re-identification 空间约束下的相似性学习对人物再识别的影响

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.142

Dapeng Chen, Zejian Yuan, Badong Chen, Nanning Zheng

Pose variation remains one of the major factors that adversely affect the accuracy of person re-identification. Such variation is not arbitrary as body parts (e.g. head, torso, legs) have relative stable spatial distribution. Breaking down the variability of global appearance regarding the spatial distribution potentially benefits the person matching. We therefore learn a novel similarity function, which consists of multiple sub-similarity measurements with each taking in charge of a subregion. In particular, we take advantage of the recently proposed polynomial feature map to describe the matching within each subregion, and inject all the feature maps into a unified framework. The framework not only outputs similarity measurements for different regions, but also makes a better consistency among them. Our framework can collaborate local similarities as well as global similarity to exploit their complementary strength. It is flexible to incorporate multiple visual cues to further elevate the performance. In experiments, we analyze the effectiveness of the major components. The results on four datasets show significant and consistent improvements over the state-of-the-art methods.

姿态变化仍然是影响人体再识别准确性的主要因素之一。这种变化不是任意的，因为身体部位(如头、躯干、腿)具有相对稳定的空间分布。在空间分布方面打破全球外观的可变性可能有利于人员匹配。因此，我们学习了一种新的相似性函数，它由多个子相似性度量组成，每个子相似性度量负责一个子区域。特别地，我们利用最近提出的多项式特征映射来描述每个子区域内的匹配，并将所有特征映射注入到一个统一的框架中。该框架不仅可以输出不同区域的相似性度量值，而且可以使它们之间的一致性更好。我们的框架可以将局部相似度和全局相似度结合起来，发挥它们的互补优势。它可以灵活地结合多种视觉线索，以进一步提升性能。在实验中，我们分析了主要成分的有效性。在四个数据集上的结果显示，与最先进的方法相比，显著和一致的改进。

{"title":"Similarity Learning with Spatial Constraints for Person Re-identification","authors":"Dapeng Chen, Zejian Yuan, Badong Chen, Nanning Zheng","doi":"10.1109/CVPR.2016.142","DOIUrl":"https://doi.org/10.1109/CVPR.2016.142","url":null,"abstract":"Pose variation remains one of the major factors that adversely affect the accuracy of person re-identification. Such variation is not arbitrary as body parts (e.g. head, torso, legs) have relative stable spatial distribution. Breaking down the variability of global appearance regarding the spatial distribution potentially benefits the person matching. We therefore learn a novel similarity function, which consists of multiple sub-similarity measurements with each taking in charge of a subregion. In particular, we take advantage of the recently proposed polynomial feature map to describe the matching within each subregion, and inject all the feature maps into a unified framework. The framework not only outputs similarity measurements for different regions, but also makes a better consistency among them. Our framework can collaborate local similarities as well as global similarity to exploit their complementary strength. It is flexible to incorporate multiple visual cues to further elevate the performance. In experiments, we analyze the effectiveness of the major components. The results on four datasets show significant and consistent improvements over the state-of-the-art methods.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"49 1","pages":"1268-1277"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80715111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 340

Constrained Joint Cascade Regression Framework for Simultaneous Facial Action Unit Recognition and Facial Landmark Detection 基于约束联合级联回归框架的人脸动作单元识别与人脸标记检测

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.370

Yue Wu, Q. Ji

Cascade regression framework has been shown to be effective for facial landmark detection. It starts from an initial face shape and gradually predicts the face shape update from the local appearance features to generate the facial landmark locations in the next iteration until convergence. In this paper, we improve upon the cascade regression framework and propose the Constrained Joint Cascade Regression Framework (CJCRF) for simultaneous facial action unit recognition and facial landmark detection, which are two related face analysis tasks, but are seldomly exploited together. In particular, we first learn the relationships among facial action units and face shapes as a constraint. Then, in the proposed constrained joint cascade regression framework, with the help from the constraint, we iteratively update the facial landmark locations and the action unit activation probabilities until convergence. Experimental results demonstrate that the intertwined relationships of facial action units and face shapes boost the performances of both facial action unit recognition and facial landmark detection. The experimental results also demonstrate the effectiveness of the proposed method comparing to the state-of-the-art works.

级联回归框架已被证明是有效的面部特征检测。它从初始的脸型开始，逐步从局部的外观特征预测脸型更新，在下一次迭代中生成面部地标位置，直到收敛。在本文中，我们对级联回归框架进行了改进，提出了约束联合级联回归框架(CJCRF)，用于同时进行面部动作单元识别和面部地标检测，这是两个相关的人脸分析任务，但很少同时使用。特别是，我们首先学习面部动作单元和面部形状之间的关系作为约束。然后，在提出的约束联合级联回归框架中，借助约束迭代更新面部地标位置和动作单元激活概率，直到收敛。实验结果表明，面部动作单元和面部形状的相互交织关系提高了面部动作单元识别和面部标志检测的性能。实验结果也证明了该方法的有效性。

{"title":"Constrained Joint Cascade Regression Framework for Simultaneous Facial Action Unit Recognition and Facial Landmark Detection","authors":"Yue Wu, Q. Ji","doi":"10.1109/CVPR.2016.370","DOIUrl":"https://doi.org/10.1109/CVPR.2016.370","url":null,"abstract":"Cascade regression framework has been shown to be effective for facial landmark detection. It starts from an initial face shape and gradually predicts the face shape update from the local appearance features to generate the facial landmark locations in the next iteration until convergence. In this paper, we improve upon the cascade regression framework and propose the Constrained Joint Cascade Regression Framework (CJCRF) for simultaneous facial action unit recognition and facial landmark detection, which are two related face analysis tasks, but are seldomly exploited together. In particular, we first learn the relationships among facial action units and face shapes as a constraint. Then, in the proposed constrained joint cascade regression framework, with the help from the constraint, we iteratively update the facial landmark locations and the action unit activation probabilities until convergence. Experimental results demonstrate that the intertwined relationships of facial action units and face shapes boost the performances of both facial action unit recognition and facial landmark detection. The experimental results also demonstrate the effectiveness of the proposed method comparing to the state-of-the-art works.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"78 1","pages":"3400-3408"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81284668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 73

Theory and Practice of Structure-From-Motion Using Affine Correspondences 基于仿射对应的运动构造理论与实践

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.590

Carolina Raposo, J. Barreto

Affine Correspondences (ACs) are more informative than Point Correspondences (PCs) that are used as input in mainstream algorithms for Structure-from-Motion (SfM). Since ACs enable to estimate models from fewer correspondences, its use can dramatically reduce the number of combinations during the iterative step of sample-and-test that exists in most SfM pipelines. However, using ACs instead of PCs as input for SfM passes by fully understanding the relations between ACs and multi-view geometry, as well as by establishing practical, effective AC-based algorithms. This article is a step forward into this direction, by providing a clear account about how ACs constrain the two-view geometry, and by proposing new algorithms for plane segmentation and visual odometry that compare favourably with respect to methods relying in PCs.

仿射对应(ac)比点对应(pc)更有信息量，后者被用作运动结构(SfM)的主流算法的输入。由于ac能够从更少的对应中估计模型，因此它的使用可以显著减少大多数SfM管道中存在的采样和测试迭代步骤中的组合数量。然而，使用ac而不是pc作为SfM的输入，需要充分理解ac与多视图几何之间的关系，以及建立实用、有效的基于ac的算法。这篇文章是朝着这个方向迈出的一步，提供了一个关于ac如何约束双视图几何的清晰说明，并提出了与依赖于pc的方法相比有利的平面分割和视觉里程计的新算法。

引用次数: 52

Joint Learning of Single-Image and Cross-Image Representations for Person Re-identification 人物再识别的单图像与交叉图像表征联合学习

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.144

Faqiang Wang, W. Zuo, Liang Lin, D. Zhang, Lei Zhang

Person re-identification has been usually solved as either the matching of single-image representation (SIR) or the classification of cross-image representation (CIR). In this work, we exploit the connection between these two categories of methods, and propose a joint learning frame-work to unify SIR and CIR using convolutional neural network (CNN). Specifically, our deep architecture contains one shared sub-network together with two sub-networks that extract the SIRs of given images and the CIRs of given image pairs, respectively. The SIR sub-network is required to be computed once for each image (in both the probe and gallery sets), and the depth of the CIR sub-network is required to be minimal to reduce computational burden. Therefore, the two types of representation can be jointly optimized for pursuing better matching accuracy with moderate computational cost. Furthermore, the representations learned with pairwise comparison and triplet comparison objectives can be combined to improve matching performance. Experiments on the CUHK03, CUHK01 and VIPeR datasets show that the proposed method can achieve favorable accuracy while compared with state-of-the-arts.

人物再识别通常采用单图像表示的匹配或交叉图像表示的分类来解决。在这项工作中，我们利用这两类方法之间的联系，并提出了一个联合学习框架，使用卷积神经网络(CNN)统一SIR和CIR。具体来说，我们的深度架构包含一个共享子网络和两个子网络，分别提取给定图像的sir和给定图像对的cir。对于每张图像(包括探针集和图库集)，需要计算SIR子网一次，并且要求CIR子网的深度最小，以减少计算负担。因此，可以对两种表示进行联合优化，以在适度的计算成本下追求更好的匹配精度。此外，通过两两比较和三重比较目标学习到的表征可以结合起来提高匹配性能。在CUHK03、CUHK01和VIPeR数据集上进行的实验表明，该方法具有较好的精度。

{"title":"Joint Learning of Single-Image and Cross-Image Representations for Person Re-identification","authors":"Faqiang Wang, W. Zuo, Liang Lin, D. Zhang, Lei Zhang","doi":"10.1109/CVPR.2016.144","DOIUrl":"https://doi.org/10.1109/CVPR.2016.144","url":null,"abstract":"Person re-identification has been usually solved as either the matching of single-image representation (SIR) or the classification of cross-image representation (CIR). In this work, we exploit the connection between these two categories of methods, and propose a joint learning frame-work to unify SIR and CIR using convolutional neural network (CNN). Specifically, our deep architecture contains one shared sub-network together with two sub-networks that extract the SIRs of given images and the CIRs of given image pairs, respectively. The SIR sub-network is required to be computed once for each image (in both the probe and gallery sets), and the depth of the CIR sub-network is required to be minimal to reduce computational burden. Therefore, the two types of representation can be jointly optimized for pursuing better matching accuracy with moderate computational cost. Furthermore, the representations learned with pairwise comparison and triplet comparison objectives can be combined to improve matching performance. Experiments on the CUHK03, CUHK01 and VIPeR datasets show that the proposed method can achieve favorable accuracy while compared with state-of-the-arts.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"21 1","pages":"1288-1296"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85207450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 366

Predicting the Where and What of Actors and Actions through Online Action Localization 通过在线动作定位预测参与者和动作的位置和内容

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.290

K. Soomro, Haroon Idrees, M. Shah

This paper proposes a novel approach to tackle the challenging problem of 'online action localization' which entails predicting actions and their locations as they happen in a video. Typically, action localization or recognition is performed in an offline manner where all the frames in the video are processed together and action labels are not predicted for the future. This disallows timely localization of actions - an important consideration for surveillance tasks. In our approach, given a batch of frames from the immediate past in a video, we estimate pose and oversegment the current frame into superpixels. Next, we discriminatively train an actor foreground model on the superpixels using the pose bounding boxes. A Conditional Random Field with superpixels as nodes, and edges connecting spatio-temporal neighbors is used to obtain action segments. The action confidence is predicted using dynamic programming on SVM scores obtained on short segments of the video, thereby capturing sequential information of the actions. The issue of visual drift is handled by updating the appearance model and pose refinement in an online manner. Lastly, we introduce a new measure to quantify the performance of action prediction (i.e. online action localization), which analyzes how the prediction accuracy varies as a function of observed portion of the video. Our experiments suggest that despite using only a few frames to localize actions at each time instant, we are able to predict the action and obtain competitive results to state-of-the-art offline methods.

本文提出了一种新颖的方法来解决“在线动作本地化”这一具有挑战性的问题，这需要在视频中预测动作及其位置。通常，动作定位或识别是以离线方式执行的，其中视频中的所有帧被一起处理，并且不预测未来的动作标签。这样就不能及时地定位行动——这是监视任务的一个重要考虑因素。在我们的方法中，给定视频中刚刚过去的一批帧，我们估计姿态并将当前帧过度分割为超像素。接下来，我们使用姿态边界框在超像素上判别训练演员前景模型。以超像素为节点的条件随机场(Conditional Random Field)，以连接时空邻居的边缘来获取动作段。对视频短片段上得到的SVM分数进行动态规划，预测动作置信度，从而捕捉动作的序列信息。通过在线方式更新外观模型和姿态优化来处理视觉漂移问题。最后，我们引入了一种量化动作预测性能的新方法(即在线动作定位)，该方法分析了预测精度随视频观察部分的变化情况。我们的实验表明，尽管在每个时刻只使用几帧来定位动作，但我们能够预测动作并获得与最先进的离线方法相媲美的结果。

{"title":"Predicting the Where and What of Actors and Actions through Online Action Localization","authors":"K. Soomro, Haroon Idrees, M. Shah","doi":"10.1109/CVPR.2016.290","DOIUrl":"https://doi.org/10.1109/CVPR.2016.290","url":null,"abstract":"This paper proposes a novel approach to tackle the challenging problem of 'online action localization' which entails predicting actions and their locations as they happen in a video. Typically, action localization or recognition is performed in an offline manner where all the frames in the video are processed together and action labels are not predicted for the future. This disallows timely localization of actions - an important consideration for surveillance tasks. In our approach, given a batch of frames from the immediate past in a video, we estimate pose and oversegment the current frame into superpixels. Next, we discriminatively train an actor foreground model on the superpixels using the pose bounding boxes. A Conditional Random Field with superpixels as nodes, and edges connecting spatio-temporal neighbors is used to obtain action segments. The action confidence is predicted using dynamic programming on SVM scores obtained on short segments of the video, thereby capturing sequential information of the actions. The issue of visual drift is handled by updating the appearance model and pose refinement in an online manner. Lastly, we introduce a new measure to quantify the performance of action prediction (i.e. online action localization), which analyzes how the prediction accuracy varies as a function of observed portion of the video. Our experiments suggest that despite using only a few frames to localize actions at each time instant, we are able to predict the action and obtain competitive results to state-of-the-art offline methods.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"10 1","pages":"2648-2657"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91190230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 87

Self-Adaptive Matrix Completion for Heart Rate Estimation from Face Videos under Realistic Conditions 现实条件下人脸视频心率估计的自适应矩阵补全

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.263

S. Tulyakov, Xavier Alameda-Pineda, E. Ricci, L. Yin, J. Cohn, N. Sebe

Recent studies in computer vision have shown that, while practically invisible to a human observer, skin color changes due to blood flow can be captured on face videos and, surprisingly, be used to estimate the heart rate (HR). While considerable progress has been made in the last few years, still many issues remain open. In particular, state of-the-art approaches are not robust enough to operate in natural conditions (e.g. in case of spontaneous movements, facial expressions, or illumination changes). Opposite to previous approaches that estimate the HR by processing all the skin pixels inside a fixed region of interest, we introduce a strategy to dynamically select face regions useful for robust HR estimation. Our approach, inspired by recent advances on matrix completion theory, allows us to predict the HR while simultaneously discover the best regions of the face to be used for estimation. Thorough experimental evaluation conducted on public benchmarks suggests that the proposed approach significantly outperforms state-of the-art HR estimation methods in naturalistic conditions.

最近的计算机视觉研究表明，虽然人类观察者几乎看不见，但面部视频可以捕捉到由于血液流动而导致的肤色变化，而且令人惊讶的是，它可以用来估计心率(HR)。虽然在过去几年中取得了相当大的进展，但仍有许多问题有待解决。特别是，最先进的方法在自然条件下(例如，在自发运动，面部表情或照明变化的情况下)还不够健壮。与之前通过处理固定感兴趣区域内的所有皮肤像素来估计HR的方法相反，我们引入了一种动态选择对鲁棒HR估计有用的人脸区域的策略。我们的方法受到矩阵补全理论最新进展的启发，使我们能够预测HR，同时发现用于估计的面部最佳区域。在公共基准上进行的彻底实验评估表明，所提出的方法在自然条件下明显优于最先进的人力资源估计方法。

{"title":"Self-Adaptive Matrix Completion for Heart Rate Estimation from Face Videos under Realistic Conditions","authors":"S. Tulyakov, Xavier Alameda-Pineda, E. Ricci, L. Yin, J. Cohn, N. Sebe","doi":"10.1109/CVPR.2016.263","DOIUrl":"https://doi.org/10.1109/CVPR.2016.263","url":null,"abstract":"Recent studies in computer vision have shown that, while practically invisible to a human observer, skin color changes due to blood flow can be captured on face videos and, surprisingly, be used to estimate the heart rate (HR). While considerable progress has been made in the last few years, still many issues remain open. In particular, state of-the-art approaches are not robust enough to operate in natural conditions (e.g. in case of spontaneous movements, facial expressions, or illumination changes). Opposite to previous approaches that estimate the HR by processing all the skin pixels inside a fixed region of interest, we introduce a strategy to dynamically select face regions useful for robust HR estimation. Our approach, inspired by recent advances on matrix completion theory, allows us to predict the HR while simultaneously discover the best regions of the face to be used for estimation. Thorough experimental evaluation conducted on public benchmarks suggests that the proposed approach significantly outperforms state-of the-art HR estimation methods in naturalistic conditions.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"39 1","pages":"2396-2404"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90238668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 240

A Probabilistic Framework for Color-Based Point Set Registration 基于颜色的点集配准的概率框架

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.201

Martin Danelljan, G. Meneghetti, F. Khan, M. Felsberg

In recent years, sensors capable of measuring both color and depth information have become increasingly popular. Despite the abundance of colored point set data, stateof-the-art probabilistic registration techniques ignore the available color information. In this paper, we propose a probabilistic point set registration framework that exploits available color information associated with the points. Our method is based on a model of the joint distribution of 3D-point observations and their color information. The proposed model captures discriminative color information, while being computationally efficient. We derive an EM algorithm for jointly estimating the model parameters and the relative transformations. Comprehensive experiments are performed on the Stanford Lounge dataset, captured by an RGB-D camera, and two point sets captured by a Lidar sensor. Our results demonstrate a significant gain in robustness and accuracy when incorporating color information. On the Stanford Lounge dataset, our approach achieves a relative reduction of the failure rate by 78% compared to the baseline. Furthermore, our proposed model outperforms standard strategies for combining color and 3D-point information, leading to state-of-the-art results.

近年来，能够测量颜色和深度信息的传感器变得越来越流行。尽管有丰富的彩色点集数据，最先进的概率配准技术忽略了可用的颜色信息。在本文中，我们提出了一种利用与点相关的可用颜色信息的概率点集配准框架。我们的方法是基于三维点观测及其颜色信息的联合分布模型。该模型在计算效率高的同时，还能捕获有区别的颜色信息。我们推导了一种联合估计模型参数和相关变换的电磁算法。在RGB-D相机捕获的Stanford Lounge数据集和激光雷达传感器捕获的两个点集上进行了综合实验。我们的结果表明，当结合颜色信息时，鲁棒性和准确性显著提高。在斯坦福休息室数据集上，与基线相比，我们的方法将故障率相对降低了78%。此外，我们提出的模型优于结合颜色和3d点信息的标准策略，从而获得最先进的结果。

{"title":"A Probabilistic Framework for Color-Based Point Set Registration","authors":"Martin Danelljan, G. Meneghetti, F. Khan, M. Felsberg","doi":"10.1109/CVPR.2016.201","DOIUrl":"https://doi.org/10.1109/CVPR.2016.201","url":null,"abstract":"In recent years, sensors capable of measuring both color and depth information have become increasingly popular. Despite the abundance of colored point set data, stateof-the-art probabilistic registration techniques ignore the available color information. In this paper, we propose a probabilistic point set registration framework that exploits available color information associated with the points. Our method is based on a model of the joint distribution of 3D-point observations and their color information. The proposed model captures discriminative color information, while being computationally efficient. We derive an EM algorithm for jointly estimating the model parameters and the relative transformations. Comprehensive experiments are performed on the Stanford Lounge dataset, captured by an RGB-D camera, and two point sets captured by a Lidar sensor. Our results demonstrate a significant gain in robustness and accuracy when incorporating color information. On the Stanford Lounge dataset, our approach achieves a relative reduction of the failure rate by 78% compared to the baseline. Furthermore, our proposed model outperforms standard strategies for combining color and 3D-point information, leading to state-of-the-art results.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"685 1","pages":"1818-1826"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76874082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

Laplacian Patch-Based Image Synthesis 基于拉普拉斯补丁的图像合成

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.298

J. Lee, Inchang Choi, Min H. Kim

Patch-based image synthesis has been enriched with global optimization on the image pyramid. Successively, the gradient-based synthesis has improved structural coherence and details. However, the gradient operator is directional and inconsistent and requires computing multiple operators. It also introduces a significantly heavy computational burden to solve the Poisson equation that often accompanies artifacts in non-integrable gradient fields. In this paper, we propose a patch-based synthesis using a Laplacian pyramid to improve searching correspondence with enhanced awareness of edge structures. Contrary to the gradient operators, the Laplacian pyramid has the advantage of being isotropic in detecting changes to provide more consistent performance in decomposing the base structure and the detailed localization. Furthermore, it does not require heavy computation as it employs approximation by the differences of Gaussians. We examine the potentials of the Laplacian pyramid for enhanced edge-aware correspondence search. We demonstrate the effectiveness of the Laplacian-based approach over the state-of-the-art patchbased image synthesis methods.

对图像金字塔进行全局优化，丰富了基于patch的图像合成。随后，基于梯度的合成提高了结构相干性和细节性。但梯度算子具有方向性和不一致性，需要计算多个算子。它还引入了一个显着沉重的计算负担，以解决泊松方程，往往伴随着伪影在不可积的梯度场。在本文中，我们提出了一种基于斑块的综合，利用拉普拉斯金字塔来提高搜索对应性，增强边缘结构的感知。与梯度算子相比，拉普拉斯金字塔在检测变化方面具有各向同性的优点，在分解基础结构和详细定位方面提供了更一致的性能。此外，它不需要大量的计算，因为它采用了高斯差分近似。我们研究了拉普拉斯金字塔的潜力，以增强边缘感知对应搜索。我们证明了基于拉普拉斯的方法比最先进的基于补丁的图像合成方法的有效性。

引用次数: 38

Multi-view People Tracking via Hierarchical Trajectory Composition 基于分层轨迹合成的多视角人物跟踪

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.461

Yuanlu Xu, Xiaobai Liu, Yang Liu, Song-Chun Zhu

This paper presents a hierarchical composition approach for multi-view object tracking. The key idea is to adaptively exploit multiple cues in both 2D and 3D, e.g., ground occupancy consistency, appearance similarity, motion coherence etc., which are mutually complementary while tracking the humans of interests over time. While feature online selection has been extensively studied in the past literature, it remains unclear how to effectively schedule these cues for the tracking purpose especially when encountering various challenges, e.g. occlusions, conjunctions, and appearance variations. To do so, we propose a hierarchical composition model and re-formulate multi-view multi-object tracking as a problem of compositional structure optimization. We setup a set of composition criteria, each of which corresponds to one particular cue. The hierarchical composition process is pursued by exploiting different criteria, which impose constraints between a graph node and its offsprings in the hierarchy. We learn the composition criteria using MLE on annotated data and efficiently construct the hierarchical graph by an iterative greedy pursuit algorithm. In the experiments, we demonstrate superior performance of our approach on three public datasets, one of which is newly created by us to test various challenges in multi-view multi-object tracking.

提出了一种用于多视图目标跟踪的分层组合方法。关键思想是自适应地利用2D和3D中的多个线索，例如，地面占用一致性，外观相似性，运动一致性等，这些线索在跟踪感兴趣的人类时是相互补充的。虽然在过去的文献中对特征在线选择进行了广泛的研究，但如何有效地安排这些线索用于跟踪目的仍然不清楚，特别是在遇到各种挑战时，例如闭塞，连词和外观变化。为此，我们提出了一种分层组合模型，并将多视图多目标跟踪问题重新表述为组合结构优化问题。我们设置了一组组合标准，每个标准对应于一个特定的线索。通过利用不同的标准来实现分层组合过程，这些标准在层次结构中的图节点及其后代之间施加约束。我们在标注数据上使用MLE学习组合准则，并通过迭代贪婪追踪算法高效地构建层次图。在实验中，我们证明了我们的方法在三个公共数据集上的优越性能，其中一个是我们新创建的，用于测试多视图多目标跟踪中的各种挑战。

{"title":"Multi-view People Tracking via Hierarchical Trajectory Composition","authors":"Yuanlu Xu, Xiaobai Liu, Yang Liu, Song-Chun Zhu","doi":"10.1109/CVPR.2016.461","DOIUrl":"https://doi.org/10.1109/CVPR.2016.461","url":null,"abstract":"This paper presents a hierarchical composition approach for multi-view object tracking. The key idea is to adaptively exploit multiple cues in both 2D and 3D, e.g., ground occupancy consistency, appearance similarity, motion coherence etc., which are mutually complementary while tracking the humans of interests over time. While feature online selection has been extensively studied in the past literature, it remains unclear how to effectively schedule these cues for the tracking purpose especially when encountering various challenges, e.g. occlusions, conjunctions, and appearance variations. To do so, we propose a hierarchical composition model and re-formulate multi-view multi-object tracking as a problem of compositional structure optimization. We setup a set of composition criteria, each of which corresponds to one particular cue. The hierarchical composition process is pursued by exploiting different criteria, which impose constraints between a graph node and its offsprings in the hierarchy. We learn the composition criteria using MLE on annotated data and efficiently construct the hierarchical graph by an iterative greedy pursuit algorithm. In the experiments, we demonstrate superior performance of our approach on three public datasets, one of which is newly created by us to test various challenges in multi-view multi-object tracking.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"11 1","pages":"4256-4265"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79707481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 121

Learning with Side Information through Modality Hallucination 通过模态幻觉学习副信息

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.96

Judy Hoffman, Saurabh Gupta, Trevor Darrell

We present a modality hallucination architecture for training an RGB object detection model which incorporates depth side information at training time. Our convolutional hallucination network learns a new and complementary RGB image representation which is taught to mimic convolutional mid-level features from a depth network. At test time images are processed jointly through the RGB and hallucination networks to produce improved detection performance. Thus, our method transfers information commonly extracted from depth training data to a network which can extract that information from the RGB counterpart. We present results on the standard NYUDv2 dataset and report improvement on the RGB detection task.

我们提出了一种用于训练RGB目标检测模型的模态幻觉架构，该模型在训练时包含深度侧信息。我们的卷积幻觉网络学习了一种新的和互补的RGB图像表示，它被教导模仿来自深度网络的卷积中级特征。在测试时，图像通过RGB和幻觉网络共同处理，以提高检测性能。因此，我们的方法将通常从深度训练数据中提取的信息传输到可以从RGB对应数据中提取信息的网络中。我们在标准NYUDv2数据集上展示了结果，并报告了RGB检测任务的改进。

引用次数: 199

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀