首页 > 最新文献

2014 IEEE Conference on Computer Vision and Pattern Recognition最新文献

英文 中文
Tracklet Association with Online Target-Specific Metric Learning Tracklet与在线目标特定度量学习的关联
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.161
B. Wang, G. Wang, K. Chan, Li Wang
This paper presents a novel introduction of online target-specific metric learning in track fragment (tracklet) association by network flow optimization for long-term multi-person tracking. Different from other network flow formulation, each node in our network represents a tracklet, and each edge represents the likelihood of neighboring tracklets belonging to the same trajectory as measured by our proposed affinity score. In our method, target-specific similarity metrics are learned, which give rise to the appearance-based models used in the tracklet affinity estimation. Trajectory-based tracklets are refined by using the learned metrics to account for appearance consistency and to identify reliable tracklets. The metrics are then re-learned using reliable tracklets for computing tracklet affinity scores. Long-term trajectories are then obtained through network flow optimization. Occlusions and missed detections are handled by a trajectory completion step. Our method is effective for long-term tracking even when the targets are spatially close or completely occluded by others. We validate our proposed framework on several public datasets and show that it outperforms several state of art methods.
本文提出了一种基于网络流优化的在线目标特定度量学习方法,用于长期多人跟踪的轨迹片段关联。与其他网络流公式不同的是,我们的网络中的每个节点代表一个轨迹,每个边代表相邻轨迹属于同一轨迹的可能性,这是由我们提出的亲和力评分衡量的。在我们的方法中,学习目标特定的相似性度量,从而产生基于外观的模型,用于tracklet亲和估计。基于轨迹的轨迹通过使用学习到的度量来考虑外观一致性并识别可靠的轨迹来改进。然后使用可靠的轨迹来重新学习这些指标,以计算轨迹关联分数。然后通过网络流优化获得长期轨迹。遮挡和漏检由轨迹完成步骤处理。该方法在目标空间距离较近或被其他目标完全遮挡的情况下也能有效地进行长期跟踪。我们在几个公共数据集上验证了我们提出的框架,并表明它优于几种最先进的方法。
{"title":"Tracklet Association with Online Target-Specific Metric Learning","authors":"B. Wang, G. Wang, K. Chan, Li Wang","doi":"10.1109/CVPR.2014.161","DOIUrl":"https://doi.org/10.1109/CVPR.2014.161","url":null,"abstract":"This paper presents a novel introduction of online target-specific metric learning in track fragment (tracklet) association by network flow optimization for long-term multi-person tracking. Different from other network flow formulation, each node in our network represents a tracklet, and each edge represents the likelihood of neighboring tracklets belonging to the same trajectory as measured by our proposed affinity score. In our method, target-specific similarity metrics are learned, which give rise to the appearance-based models used in the tracklet affinity estimation. Trajectory-based tracklets are refined by using the learned metrics to account for appearance consistency and to identify reliable tracklets. The metrics are then re-learned using reliable tracklets for computing tracklet affinity scores. Long-term trajectories are then obtained through network flow optimization. Occlusions and missed detections are handled by a trajectory completion step. Our method is effective for long-term tracking even when the targets are spatially close or completely occluded by others. We validate our proposed framework on several public datasets and show that it outperforms several state of art methods.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125849618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 110
Video Classification Using Semantic Concept Co-occurrences 基于语义概念共现的视频分类
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.324
Shayan Modiri Assari, A. Zamir, M. Shah
We address the problem of classifying complex videos based on their content. A typical approach to this problem is performing the classification using semantic attributes, commonly termed concepts, which occur in the video. In this paper, we propose a contextual approach to video classification based on Generalized Maximum Clique Problem (GMCP) which uses the co-occurrence of concepts as the context model. To be more specific, we propose to represent a class based on the co-occurrence of its concepts and classify a video based on matching its semantic co-occurrence pattern to each class representation. We perform the matching using GMCP which finds the strongest clique of co-occurring concepts in a video. We argue that, in principal, the co-occurrence of concepts yields a richer representation of a video compared to most of the current approaches. Additionally, we propose a novel optimal solution to GMCP based on Mixed Binary Integer Programming (MBIP). The evaluations show our approach, which opens new opportunities for further research in this direction, outperforms several well established video classification methods.
我们解决了基于内容对复杂视频进行分类的问题。此问题的典型方法是使用语义属性(通常称为概念)执行分类,这些属性出现在视频中。本文提出了一种基于广义最大团问题(GMCP)的视频分类上下文方法,该方法使用概念共现作为上下文模型。更具体地说,我们建议基于其概念的共现来表示一个类,并基于将其语义共现模式与每个类表示相匹配来对视频进行分类。我们使用GMCP进行匹配,GMCP找到视频中共同出现的概念的最强团。我们认为,原则上,与大多数当前方法相比,概念的共现产生了更丰富的视频表示。此外,我们还提出了一种基于混合二进制整数规划(MBIP)的GMCP最优解。评估表明,我们的方法优于几种成熟的视频分类方法,为这一方向的进一步研究开辟了新的机会。
{"title":"Video Classification Using Semantic Concept Co-occurrences","authors":"Shayan Modiri Assari, A. Zamir, M. Shah","doi":"10.1109/CVPR.2014.324","DOIUrl":"https://doi.org/10.1109/CVPR.2014.324","url":null,"abstract":"We address the problem of classifying complex videos based on their content. A typical approach to this problem is performing the classification using semantic attributes, commonly termed concepts, which occur in the video. In this paper, we propose a contextual approach to video classification based on Generalized Maximum Clique Problem (GMCP) which uses the co-occurrence of concepts as the context model. To be more specific, we propose to represent a class based on the co-occurrence of its concepts and classify a video based on matching its semantic co-occurrence pattern to each class representation. We perform the matching using GMCP which finds the strongest clique of co-occurring concepts in a video. We argue that, in principal, the co-occurrence of concepts yields a richer representation of a video compared to most of the current approaches. Additionally, we propose a novel optimal solution to GMCP based on Mixed Binary Integer Programming (MBIP). The evaluations show our approach, which opens new opportunities for further research in this direction, outperforms several well established video classification methods.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124645144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Who Do I Look Like? Determining Parent-Offspring Resemblance via Gated Autoencoders 我长得像谁?通过门控自动编码器确定亲代相似性
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.227
Afshin Dehghan, E. Ortiz, Ruben Villegas, M. Shah
Recent years have seen a major push for face recognition technology due to the large expansion of image sharing on social networks. In this paper, we consider the difficult task of determining parent-offspring resemblance using deep learning to answer the question "Who do I look like?" Although humans can perform this job at a rate higher than chance, it is not clear how they do it [2]. However, recent studies in anthropology [24] have determined which features tend to be the most discriminative. In this study, we aim to not only create an accurate system for resemblance detection, but bridge the gap between studies in anthropology with computer vision techniques. Further, we aim to answer two key questions: 1) Do offspring resemble their parents? and 2) Do offspring resemble one parent more than the other? We propose an algorithm that fuses the features and metrics discovered via gated autoencoders with a discriminative neural network layer that learns the optimal, or what we call genetic, features to delineate parent-offspring relationships. We further analyze the correlation between our automatically detected features and those found in anthropological studies. Meanwhile, our method outperforms the state-of-the-art in kinship verification by 3-10% depending on the relationship using specific (father-son, mother-daughter, etc.) and generic models.
近年来,由于社交网络上图片分享的大规模扩张,人脸识别技术得到了大力推动。在本文中,我们考虑了使用深度学习来回答“我长得像谁?”这个问题来确定亲子相似性的困难任务。尽管人类能够以高于偶然的速度完成这项工作,但目前尚不清楚他们是如何做到的。然而,最近的人类学研究已经确定了哪些特征是最具歧视性的。在这项研究中,我们的目标不仅是创建一个准确的相似性检测系统,而且弥合了人类学研究与计算机视觉技术之间的差距。此外,我们的目标是回答两个关键问题:1)后代与父母相似吗?2)后代是否更像父母中的一方?我们提出了一种算法,该算法将通过门控自动编码器发现的特征和指标与判别神经网络层融合在一起,该神经网络层学习最佳的,或者我们称之为遗传的特征来描述亲子关系。我们进一步分析了我们自动检测到的特征与人类学研究中发现的特征之间的相关性。同时,根据具体(父子、母女等)和通用模型的关系,我们的方法在亲属关系验证方面比目前最先进的方法高出3-10%。
{"title":"Who Do I Look Like? Determining Parent-Offspring Resemblance via Gated Autoencoders","authors":"Afshin Dehghan, E. Ortiz, Ruben Villegas, M. Shah","doi":"10.1109/CVPR.2014.227","DOIUrl":"https://doi.org/10.1109/CVPR.2014.227","url":null,"abstract":"Recent years have seen a major push for face recognition technology due to the large expansion of image sharing on social networks. In this paper, we consider the difficult task of determining parent-offspring resemblance using deep learning to answer the question \"Who do I look like?\" Although humans can perform this job at a rate higher than chance, it is not clear how they do it [2]. However, recent studies in anthropology [24] have determined which features tend to be the most discriminative. In this study, we aim to not only create an accurate system for resemblance detection, but bridge the gap between studies in anthropology with computer vision techniques. Further, we aim to answer two key questions: 1) Do offspring resemble their parents? and 2) Do offspring resemble one parent more than the other? We propose an algorithm that fuses the features and metrics discovered via gated autoencoders with a discriminative neural network layer that learns the optimal, or what we call genetic, features to delineate parent-offspring relationships. We further analyze the correlation between our automatically detected features and those found in anthropological studies. Meanwhile, our method outperforms the state-of-the-art in kinship verification by 3-10% depending on the relationship using specific (father-son, mother-daughter, etc.) and generic models.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125048884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 111
Facial Expression Recognition via a Boosted Deep Belief Network 基于增强深度信念网络的面部表情识别
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.233
Ping Liu, Shizhong Han, Zibo Meng, Yan Tong
A training process for facial expression recognition is usually performed sequentially in three individual stages: feature learning, feature selection, and classifier construction. Extensive empirical studies are needed to search for an optimal combination of feature representation, feature set, and classifier to achieve good recognition performance. This paper presents a novel Boosted Deep Belief Network (BDBN) for performing the three training stages iteratively in a unified loopy framework. Through the proposed BDBN framework, a set of features, which is effective to characterize expression-related facial appearance/shape changes, can be learned and selected to form a boosted strong classifier in a statistical way. As learning continues, the strong classifier is improved iteratively and more importantly, the discriminative capabilities of selected features are strengthened as well according to their relative importance to the strong classifier via a joint fine-tune process in the BDBN framework. Extensive experiments on two public databases showed that the BDBN framework yielded dramatic improvements in facial expression analysis.
面部表情识别的训练过程通常分三个阶段依次进行:特征学习、特征选择和分类器构建。为了获得良好的识别性能,需要大量的实证研究来寻找特征表示、特征集和分类器的最佳组合。本文提出了一种新的增强深度信念网络(boosting Deep Belief Network, BDBN),用于在统一的循环框架中迭代执行三个训练阶段。通过提出的BDBN框架,可以学习和选择一组有效表征与表情相关的面部外观/形状变化的特征,以统计的方式形成增强的强分类器。随着学习的继续,强分类器被迭代改进,更重要的是,根据所选特征对强分类器的相对重要性,通过BDBN框架中的联合微调过程,识别能力也得到加强。在两个公共数据库上进行的大量实验表明,BDBN框架在面部表情分析方面取得了显著的进步。
{"title":"Facial Expression Recognition via a Boosted Deep Belief Network","authors":"Ping Liu, Shizhong Han, Zibo Meng, Yan Tong","doi":"10.1109/CVPR.2014.233","DOIUrl":"https://doi.org/10.1109/CVPR.2014.233","url":null,"abstract":"A training process for facial expression recognition is usually performed sequentially in three individual stages: feature learning, feature selection, and classifier construction. Extensive empirical studies are needed to search for an optimal combination of feature representation, feature set, and classifier to achieve good recognition performance. This paper presents a novel Boosted Deep Belief Network (BDBN) for performing the three training stages iteratively in a unified loopy framework. Through the proposed BDBN framework, a set of features, which is effective to characterize expression-related facial appearance/shape changes, can be learned and selected to form a boosted strong classifier in a statistical way. As learning continues, the strong classifier is improved iteratively and more importantly, the discriminative capabilities of selected features are strengthened as well according to their relative importance to the strong classifier via a joint fine-tune process in the BDBN framework. Extensive experiments on two public databases showed that the BDBN framework yielded dramatic improvements in facial expression analysis.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128448575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 557
Temporal Segmentation of Egocentric Videos 自我中心视频的时间分割
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.325
Y. Poleg, Chetan Arora, Shmuel Peleg
The use of wearable cameras makes it possible to record life logging egocentric videos. Browsing such long unstructured videos is time consuming and tedious. Segmentation into meaningful chapters is an important first step towards adding structure to egocentric videos, enabling efficient browsing, indexing and summarization of the long videos. Two sources of information for video segmentation are (i) the motion of the camera wearer, and (ii) the objects and activities recorded in the video. In this paper we address the motion cues for video segmentation. Motion based segmentation is especially difficult in egocentric videos when the camera is constantly moving due to natural head movement of the wearer. We propose a robust temporal segmentation of egocentric videos into a hierarchy of motion classes using a new Cumulative Displacement Curves. Unlike instantaneous motion vectors, segmentation using integrated motion vectors performs well even in dynamic and crowded scenes. No assumptions are made on the underlying scene structure and the method works in indoor as well as outdoor situations. We demonstrate the effectiveness of our approach using publicly available videos as well as choreographed videos. We also suggest an approach to detect the fixation of wearer's gaze in the walking portion of the egocentric videos.
使用可穿戴相机可以记录以自我为中心的生活日志视频。浏览这种冗长的无结构视频既耗时又乏味。分割成有意义的章节是为以自我为中心的视频添加结构的重要的第一步,使长视频能够有效地浏览,索引和总结。视频分割的两个信息源是(i)摄像机佩戴者的运动,以及(ii)视频中记录的对象和活动。本文主要研究视频分割中的运动线索。当摄像机由于佩戴者的自然头部运动而不断移动时,基于运动的分割在以自我为中心的视频中尤其困难。我们提出了一个鲁棒的时间分割自中心视频到一个层次的运动类使用新的累积位移曲线。与瞬时运动矢量不同,即使在动态和拥挤的场景中,使用集成运动矢量的分割效果也很好。没有对潜在的场景结构做任何假设,该方法在室内和室外情况下都有效。我们使用公开可用的视频和精心编排的视频来证明我们方法的有效性。我们还提出了一种方法来检测穿戴者在以自我为中心的视频中行走部分的注视。
{"title":"Temporal Segmentation of Egocentric Videos","authors":"Y. Poleg, Chetan Arora, Shmuel Peleg","doi":"10.1109/CVPR.2014.325","DOIUrl":"https://doi.org/10.1109/CVPR.2014.325","url":null,"abstract":"The use of wearable cameras makes it possible to record life logging egocentric videos. Browsing such long unstructured videos is time consuming and tedious. Segmentation into meaningful chapters is an important first step towards adding structure to egocentric videos, enabling efficient browsing, indexing and summarization of the long videos. Two sources of information for video segmentation are (i) the motion of the camera wearer, and (ii) the objects and activities recorded in the video. In this paper we address the motion cues for video segmentation. Motion based segmentation is especially difficult in egocentric videos when the camera is constantly moving due to natural head movement of the wearer. We propose a robust temporal segmentation of egocentric videos into a hierarchy of motion classes using a new Cumulative Displacement Curves. Unlike instantaneous motion vectors, segmentation using integrated motion vectors performs well even in dynamic and crowded scenes. No assumptions are made on the underlying scene structure and the method works in indoor as well as outdoor situations. We demonstrate the effectiveness of our approach using publicly available videos as well as choreographed videos. We also suggest an approach to detect the fixation of wearer's gaze in the walking portion of the egocentric videos.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128483469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 171
Sparse Dictionary Learning for Edit Propagation of High-Resolution Images 用于高分辨率图像编辑传播的稀疏字典学习
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.365
Xiaowu Chen, Dongqing Zou, Jianwei Li, Xiaochun Cao, Qinping Zhao, Hao Zhang
We introduce a method of sparse dictionary learning for edit propagation of high-resolution images or video. Previous approaches for edit propagation typically employ a global optimization over the whole set of image pixels, incurring a prohibitively high memory and time consumption for high-resolution images. Rather than propagating an edit pixel by pixel, we follow the principle of sparse representation to obtain a compact set of representative samples (or features) and perform edit propagation on the samples instead. The sparse set of samples provides an intrinsic basis for an input image, and the coding coefficients capture the linear relationship between all pixels and the samples. The representative set of samples is then optimized by a novel scheme which maximizes the KL-divergence between each sample pair to remove redundant samples. We show several applications of sparsity-based edit propagation including video recoloring, theme editing, and seamless cloning, operating on both color and texture features. We demonstrate that with a sample-to-pixel ratio in the order of 0.01%, signifying a significant reduction on memory consumption, our method still maintains a high-degree of visual fidelity.
介绍了一种用于高分辨率图像或视频编辑传播的稀疏字典学习方法。以前的编辑传播方法通常对整个图像像素集进行全局优化,对于高分辨率图像,这会导致过高的内存和时间消耗。我们不是逐像素传播编辑,而是遵循稀疏表示的原则来获得一组紧凑的代表性样本(或特征),并在样本上执行编辑传播。样本的稀疏集为输入图像提供了内在的基础,编码系数捕获了所有像素与样本之间的线性关系。然后通过一种新颖的方案来优化样本的代表性集,该方案最大化每个样本对之间的kl -散度以去除冗余样本。我们展示了基于稀疏的编辑传播的几个应用,包括视频重新着色、主题编辑和无缝克隆,同时操作颜色和纹理特征。我们证明了样本像素比在0.01%左右,这意味着内存消耗的显著减少,我们的方法仍然保持了高度的视觉保真度。
{"title":"Sparse Dictionary Learning for Edit Propagation of High-Resolution Images","authors":"Xiaowu Chen, Dongqing Zou, Jianwei Li, Xiaochun Cao, Qinping Zhao, Hao Zhang","doi":"10.1109/CVPR.2014.365","DOIUrl":"https://doi.org/10.1109/CVPR.2014.365","url":null,"abstract":"We introduce a method of sparse dictionary learning for edit propagation of high-resolution images or video. Previous approaches for edit propagation typically employ a global optimization over the whole set of image pixels, incurring a prohibitively high memory and time consumption for high-resolution images. Rather than propagating an edit pixel by pixel, we follow the principle of sparse representation to obtain a compact set of representative samples (or features) and perform edit propagation on the samples instead. The sparse set of samples provides an intrinsic basis for an input image, and the coding coefficients capture the linear relationship between all pixels and the samples. The representative set of samples is then optimized by a novel scheme which maximizes the KL-divergence between each sample pair to remove redundant samples. We show several applications of sparsity-based edit propagation including video recoloring, theme editing, and seamless cloning, operating on both color and texture features. We demonstrate that with a sample-to-pixel ratio in the order of 0.01%, signifying a significant reduction on memory consumption, our method still maintains a high-degree of visual fidelity.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129655567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching 学习检测地面控制点以提高立体匹配精度
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.210
Aristotle Spyropoulos, N. Komodakis, Philippos Mordohai
While machine learning has been instrumental to the ongoing progress in most areas of computer vision, it has not been applied to the problem of stereo matching with similar frequency or success. We present a supervised learning approach for predicting the correctness of stereo matches based on a random forest and a set of features that capture various forms of information about each pixel. We show highly competitive results in predicting the correctness of matches and in confidence estimation, which allows us to rank pixels according to the reliability of their assigned disparities. Moreover, we show how these confidence values can be used to improve the accuracy of disparity maps by integrating them with an MRF-based stereo algorithm. This is an important distinction from current literature that has mainly focused on sparsification by removing potentially erroneous disparities to generate quasi-dense disparity maps.
虽然机器学习在计算机视觉的大多数领域都取得了长足的进步,但它还没有被应用于类似频率或成功的立体匹配问题。我们提出了一种监督学习方法,用于基于随机森林和一组捕获关于每个像素的各种形式信息的特征来预测立体匹配的正确性。我们在预测匹配的正确性和置信度估计方面显示了高度竞争的结果,这使我们能够根据其分配差异的可靠性对像素进行排序。此外,我们展示了如何将这些置信度值与基于磁磁共振成像的立体算法相结合,从而提高视差图的精度。这是与当前文献的一个重要区别,当前文献主要关注通过消除潜在的错误差异来生成准密集差异图的稀疏化。
{"title":"Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching","authors":"Aristotle Spyropoulos, N. Komodakis, Philippos Mordohai","doi":"10.1109/CVPR.2014.210","DOIUrl":"https://doi.org/10.1109/CVPR.2014.210","url":null,"abstract":"While machine learning has been instrumental to the ongoing progress in most areas of computer vision, it has not been applied to the problem of stereo matching with similar frequency or success. We present a supervised learning approach for predicting the correctness of stereo matches based on a random forest and a set of features that capture various forms of information about each pixel. We show highly competitive results in predicting the correctness of matches and in confidence estimation, which allows us to rank pixels according to the reliability of their assigned disparities. Moreover, we show how these confidence values can be used to improve the accuracy of disparity maps by integrating them with an MRF-based stereo algorithm. This is an important distinction from current literature that has mainly focused on sparsification by removing potentially erroneous disparities to generate quasi-dense disparity maps.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127217148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 108
Complex Non-rigid Motion 3D Reconstruction by Union of Subspaces 基于子空间并集的复杂非刚体运动三维重建
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.200
Yingying Zhu, Dong Huang, F. D. L. Torre, S. Lucey
The task of estimating complex non-rigid 3D motion through a monocular camera is of increasing interest to the wider scientific community. Assuming one has the 2D point tracks of the non-rigid object in question, the vision community refers to this problem as Non-Rigid Structure from Motion (NRSfM). In this paper we make two contributions. First, we demonstrate empirically that the current state of the art approach to NRSfM (i.e. Dai et al. [5]) exhibits poor reconstruction performance on complex motion (i.e motions involving a sequence of primitive actions such as walk, sit and stand involving a human object). Second, we propose that this limitation can be circumvented by modeling complex motion as a union of subspaces. This does not naturally occur in Dai et al.'s approach which instead makes a less compact summation of subspaces assumption. Experiments on both synthetic and real videos illustrate the benefits of our approach for the complex nonrigid motion analysis.
通过单目摄像机估计复杂的非刚性三维运动的任务越来越引起科学界的兴趣。假设有非刚性物体的二维点轨迹,视觉学界将此问题称为运动非刚性结构(NRSfM)。在本文中,我们做了两个贡献。首先,我们通过经验证明,目前最先进的NRSfM方法(即Dai等人[5])在复杂运动(即涉及一系列原始动作的运动,如涉及人体物体的行走、坐下和站立)上表现出较差的重建性能。其次,我们提出可以通过将复杂运动建模为子空间的并集来规避这一限制。这在Dai等人的方法中不会自然发生,而是对子空间假设进行不太紧凑的求和。在合成视频和真实视频上的实验表明了我们的方法对复杂非刚体运动分析的好处。
{"title":"Complex Non-rigid Motion 3D Reconstruction by Union of Subspaces","authors":"Yingying Zhu, Dong Huang, F. D. L. Torre, S. Lucey","doi":"10.1109/CVPR.2014.200","DOIUrl":"https://doi.org/10.1109/CVPR.2014.200","url":null,"abstract":"The task of estimating complex non-rigid 3D motion through a monocular camera is of increasing interest to the wider scientific community. Assuming one has the 2D point tracks of the non-rigid object in question, the vision community refers to this problem as Non-Rigid Structure from Motion (NRSfM). In this paper we make two contributions. First, we demonstrate empirically that the current state of the art approach to NRSfM (i.e. Dai et al. [5]) exhibits poor reconstruction performance on complex motion (i.e motions involving a sequence of primitive actions such as walk, sit and stand involving a human object). Second, we propose that this limitation can be circumvented by modeling complex motion as a union of subspaces. This does not naturally occur in Dai et al.'s approach which instead makes a less compact summation of subspaces assumption. Experiments on both synthetic and real videos illustrate the benefits of our approach for the complex nonrigid motion analysis.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128881961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 133
Photometric Stereo Using Constrained Bivariate Regression for General Isotropic Surfaces 一般各向同性表面的约束二元回归光度立体
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.280
Satoshi Ikehata, K. Aizawa
This paper presents a photometric stereo method that is purely pixelwise and handles general isotropic surfaces in a stable manner. Following the recently proposed sum-of-lobes representation of the isotropic reflectance function, we constructed a constrained bivariate regression problem where the regression function is approximated by smooth, bivariate Bernstein polynomials. The unknown normal vector was separated from the unknown reflectance function by considering the inverse representation of the image formation process, and then we could accurately compute the unknown surface normals by solving a simple and efficient quadratic programming problem. Extensive evaluations that showed the state-of-the-art performance using both synthetic and real-world images were performed.
本文提出了一种纯像素立体测光方法,并以稳定的方式处理一般各向同性表面。根据最近提出的各向同性反射函数的叶状和表示,我们构造了一个约束的二元回归问题,其中回归函数由光滑的二元伯恩斯坦多项式近似。通过考虑图像形成过程的逆表示,将未知的表面法向量与未知的反射函数分离,然后通过求解一个简单高效的二次规划问题,精确地计算出未知的表面法向量。使用合成图像和真实图像进行了广泛的评估,显示了最先进的性能。
{"title":"Photometric Stereo Using Constrained Bivariate Regression for General Isotropic Surfaces","authors":"Satoshi Ikehata, K. Aizawa","doi":"10.1109/CVPR.2014.280","DOIUrl":"https://doi.org/10.1109/CVPR.2014.280","url":null,"abstract":"This paper presents a photometric stereo method that is purely pixelwise and handles general isotropic surfaces in a stable manner. Following the recently proposed sum-of-lobes representation of the isotropic reflectance function, we constructed a constrained bivariate regression problem where the regression function is approximated by smooth, bivariate Bernstein polynomials. The unknown normal vector was separated from the unknown reflectance function by considering the inverse representation of the image formation process, and then we could accurately compute the unknown surface normals by solving a simple and efficient quadratic programming problem. Extensive evaluations that showed the state-of-the-art performance using both synthetic and real-world images were performed.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128895117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 98
Geometric Generative Gaze Estimation (G3E) for Remote RGB-D Cameras 远程RGB-D相机的几何生成凝视估计(G3E)
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.229
Kenneth Alberto Funes Mora, J. Odobez
We propose a head pose invariant gaze estimation model for distant RGB-D cameras. It relies on a geometric understanding of the 3D gaze action and generation of eye images. By introducing a semantic segmentation of the eye region within a generative process, the model (i) avoids the critical feature tracking of geometrical approaches requiring high resolution images, (ii) decouples the person dependent geometry from the ambient conditions, allowing adaptation to different conditions without retraining. Priors in the generative framework are adequate for training from few samples. In addition, the model is capable of gaze extrapolation allowing for less restrictive training schemes. Comparisons with state of the art methods validate these properties which make our method highly valuable for addressing many diverse tasks in sociology, HRI and HCI.
提出了一种用于远距离RGB-D相机的头部姿态不变凝视估计模型。它依赖于对3D凝视动作的几何理解和眼睛图像的生成。通过在生成过程中引入眼睛区域的语义分割,该模型(i)避免了需要高分辨率图像的几何方法的关键特征跟踪,(ii)将依赖于人的几何与环境条件解耦,允许在不重新训练的情况下适应不同的条件。生成框架中的先验对于少量样本的训练是足够的。此外,该模型能够进行凝视外推,允许较少限制的训练方案。与最先进方法的比较验证了这些特性,使我们的方法在解决社会学、人力资源研究所和人力资源研究所的许多不同任务方面具有很高的价值。
{"title":"Geometric Generative Gaze Estimation (G3E) for Remote RGB-D Cameras","authors":"Kenneth Alberto Funes Mora, J. Odobez","doi":"10.1109/CVPR.2014.229","DOIUrl":"https://doi.org/10.1109/CVPR.2014.229","url":null,"abstract":"We propose a head pose invariant gaze estimation model for distant RGB-D cameras. It relies on a geometric understanding of the 3D gaze action and generation of eye images. By introducing a semantic segmentation of the eye region within a generative process, the model (i) avoids the critical feature tracking of geometrical approaches requiring high resolution images, (ii) decouples the person dependent geometry from the ambient conditions, allowing adaptation to different conditions without retraining. Priors in the generative framework are adequate for training from few samples. In addition, the model is capable of gaze extrapolation allowing for less restrictive training schemes. Comparisons with state of the art methods validate these properties which make our method highly valuable for addressing many diverse tasks in sociology, HRI and HCI.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"88 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132511883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
期刊
2014 IEEE Conference on Computer Vision and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1