首页 > 最新文献

2015 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
SpeDo: 6 DOF Ego-Motion Sensor Using Speckle Defocus Imaging SpeDo: 6 DOF自我运动传感器使用散斑离焦成像
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.491
Kensei Jo, Mohit Gupta, S. Nayar
Sensors that measure their motion with respect to the surrounding environment (ego-motion sensors) can be broadly classified into two categories. First is inertial sensors such as accelerometers. In order to estimate position and velocity, these sensors integrate the measured acceleration, which often results in accumulation of large errors over time. Second, camera-based approaches such as SLAM that can measure position directly, but their performance depends on the surrounding scene's properties. These approaches cannot function reliably if the scene has low frequency textures or small depth variations. We present a novel ego-motion sensor called SpeDo that addresses these fundamental limitations. SpeDo is based on using coherent light sources and cameras with large defocus. Coherent light, on interacting with a scene, creates a high frequency interferometric pattern in the captured images, called speckle. We develop a theoretical model for speckle flow (motion of speckle as a function of sensor motion), and show that it is quasi-invariant to surrounding scene's properties. As a result, SpeDo can measure ego-motion (not derivative of motion) simply by estimating optical flow at a few image locations. We have built a low-cost and compact hardware prototype of SpeDo and demonstrated high precision 6 DOF ego-motion estimation for complex trajectories in scenarios where the scene properties are challenging (e.g., repeating or no texture) as well as unknown.
测量其相对于周围环境运动的传感器(自我运动传感器)大致可分为两类。首先是惯性传感器,如加速度计。为了估计位置和速度,这些传感器集成了测量到的加速度,这往往导致随着时间的推移积累了很大的误差。其次,基于相机的方法,如SLAM,可以直接测量位置,但其性能取决于周围场景的属性。如果场景具有低频纹理或小深度变化,这些方法就不能可靠地工作。我们提出了一种名为SpeDo的新型自我运动传感器,解决了这些基本限制。SpeDo是基于使用相干光源和大散焦相机。相干光在与场景相互作用时,在捕获的图像中产生高频干涉图案,称为散斑。我们建立了一个散斑流的理论模型(散斑运动作为传感器运动的函数),并表明它对周围场景的属性是准不变的。因此,SpeDo可以通过估计几个图像位置的光流来测量自我运动(而不是运动的导数)。我们已经建立了SpeDo的低成本和紧凑的硬件原型,并在场景属性具有挑战性(例如,重复或没有纹理)以及未知的场景中演示了高精度6 DOF复杂轨迹的自我运动估计。
{"title":"SpeDo: 6 DOF Ego-Motion Sensor Using Speckle Defocus Imaging","authors":"Kensei Jo, Mohit Gupta, S. Nayar","doi":"10.1109/ICCV.2015.491","DOIUrl":"https://doi.org/10.1109/ICCV.2015.491","url":null,"abstract":"Sensors that measure their motion with respect to the surrounding environment (ego-motion sensors) can be broadly classified into two categories. First is inertial sensors such as accelerometers. In order to estimate position and velocity, these sensors integrate the measured acceleration, which often results in accumulation of large errors over time. Second, camera-based approaches such as SLAM that can measure position directly, but their performance depends on the surrounding scene's properties. These approaches cannot function reliably if the scene has low frequency textures or small depth variations. We present a novel ego-motion sensor called SpeDo that addresses these fundamental limitations. SpeDo is based on using coherent light sources and cameras with large defocus. Coherent light, on interacting with a scene, creates a high frequency interferometric pattern in the captured images, called speckle. We develop a theoretical model for speckle flow (motion of speckle as a function of sensor motion), and show that it is quasi-invariant to surrounding scene's properties. As a result, SpeDo can measure ego-motion (not derivative of motion) simply by estimating optical flow at a few image locations. We have built a low-cost and compact hardware prototype of SpeDo and demonstrated high precision 6 DOF ego-motion estimation for complex trajectories in scenarios where the scene properties are challenging (e.g., repeating or no texture) as well as unknown.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"22 1","pages":"4319-4327"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89830418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Robust Statistical Face Frontalization 鲁棒统计人脸正面化
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.441
Christos Sagonas, Yannis Panagakis, S. Zafeiriou, M. Pantic
Recently, it has been shown that excellent results can be achieved in both facial landmark localization and pose-invariant face recognition. These breakthroughs are attributed to the efforts of the community to manually annotate facial images in many different poses and to collect 3D facial data. In this paper, we propose a novel method for joint frontal view reconstruction and landmark localization using a small set of frontal images only. By observing that the frontal facial image is the one having the minimum rank of all different poses, an appropriate model which is able to jointly recover the frontalized version of the face as well as the facial landmarks is devised. To this end, a suitable optimization problem, involving the minimization of the nuclear norm and the matrix l1 norm is solved. The proposed method is assessed in frontal face reconstruction, face landmark localization, pose-invariant face recognition, and face verification in unconstrained conditions. The relevant experiments have been conducted on 8 databases. The experimental results demonstrate the effectiveness of the proposed method in comparison to the state-of-the-art methods for the target problems.
近年来,已有研究表明,该方法在人脸标记定位和姿态不变人脸识别方面均能取得优异的效果。这些突破归功于社区对许多不同姿势的面部图像进行手动注释和收集3D面部数据的努力。在本文中,我们提出了一种仅使用少量正面图像进行联合正面视图重建和地标定位的新方法。通过观察人脸正面图像是所有不同姿态中具有最小秩的图像,设计了一种能够联合恢复人脸正面版本和面部地标的合适模型。为此,求解了一个涉及核范数和矩阵l1范数最小化的合适优化问题。在无约束条件下,对该方法进行了正面人脸重建、人脸地标定位、姿态不变人脸识别和人脸验证。在8个数据库上进行了相关实验。实验结果表明,与现有的目标问题求解方法相比,该方法是有效的。
{"title":"Robust Statistical Face Frontalization","authors":"Christos Sagonas, Yannis Panagakis, S. Zafeiriou, M. Pantic","doi":"10.1109/ICCV.2015.441","DOIUrl":"https://doi.org/10.1109/ICCV.2015.441","url":null,"abstract":"Recently, it has been shown that excellent results can be achieved in both facial landmark localization and pose-invariant face recognition. These breakthroughs are attributed to the efforts of the community to manually annotate facial images in many different poses and to collect 3D facial data. In this paper, we propose a novel method for joint frontal view reconstruction and landmark localization using a small set of frontal images only. By observing that the frontal facial image is the one having the minimum rank of all different poses, an appropriate model which is able to jointly recover the frontalized version of the face as well as the facial landmarks is devised. To this end, a suitable optimization problem, involving the minimization of the nuclear norm and the matrix l1 norm is solved. The proposed method is assessed in frontal face reconstruction, face landmark localization, pose-invariant face recognition, and face verification in unconstrained conditions. The relevant experiments have been conducted on 8 databases. The experimental results demonstrate the effectiveness of the proposed method in comparison to the state-of-the-art methods for the target problems.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"20 1","pages":"3871-3879"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91533973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 103
SPM-BP: Sped-Up PatchMatch Belief Propagation for Continuous MRFs SPM-BP:连续mrf的加速补丁匹配信念传播
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.456
Yu Li, Dongbo Min, M. S. Brown, M. Do, Jiangbo Lu
Markov random fields are widely used to model many computer vision problems that can be cast in an energy minimization framework composed of unary and pairwise potentials. While computationally tractable discrete optimizers such as Graph Cuts and belief propagation (BP) exist for multi-label discrete problems, they still face prohibitively high computational challenges when the labels reside in a huge or very densely sampled space. Integrating key ideas from PatchMatch of effective particle propagation and resampling, PatchMatch belief propagation (PMBP) has been demonstrated to have good performance in addressing continuous labeling problems and runs orders of magnitude faster than Particle BP (PBP). However, the quality of the PMBP solution is tightly coupled with the local window size, over which the raw data cost is aggregated to mitigate ambiguity in the data constraint. This dependency heavily influences the overall complexity, increasing linearly with the window size. This paper proposes a novel algorithm called sped-up PMBP (SPM-BP) to tackle this critical computational bottleneck and speeds up PMBP by 50-100 times. The crux of SPM-BP is on unifying efficient filter-based cost aggregation and message passing with PatchMatch-based particle generation in a highly effective way. Though simple in its formulation, SPM-BP achieves superior performance for sub-pixel accurate stereo and optical-flow on benchmark datasets when compared with more complex and task-specific approaches.
马尔可夫随机场被广泛地用于模拟许多计算机视觉问题,这些问题可以在由一元势和两两势组成的能量最小化框架中进行建模。虽然计算上易于处理的离散优化器,如图切割和信念传播(BP)存在于多标签离散问题中,但当标签驻留在巨大或非常密集的采样空间中时,它们仍然面临着过高的计算挑战。结合PatchMatch的有效粒子传播和重采样的关键思想,PatchMatch信念传播(PMBP)在解决连续标记问题方面具有良好的性能,运行速度比粒子BP (PBP)快几个数量级。然而,PMBP解决方案的质量与本地窗口大小紧密耦合,在此基础上聚合原始数据成本以减轻数据约束中的模糊性。这种依赖关系严重影响整体复杂性,并随着窗口大小线性增加。本文提出了一种新的加速PMBP (SPM-BP)算法来解决这一关键的计算瓶颈,并将PMBP的速度提高了50-100倍。SPM-BP算法的关键在于将基于滤波器的成本聚合和消息传递与基于patchmatch的粒子生成高效地统一起来。虽然其公式简单,但与更复杂和特定任务的方法相比,SPM-BP在基准数据集上实现了亚像素精确立体和光流的卓越性能。
{"title":"SPM-BP: Sped-Up PatchMatch Belief Propagation for Continuous MRFs","authors":"Yu Li, Dongbo Min, M. S. Brown, M. Do, Jiangbo Lu","doi":"10.1109/ICCV.2015.456","DOIUrl":"https://doi.org/10.1109/ICCV.2015.456","url":null,"abstract":"Markov random fields are widely used to model many computer vision problems that can be cast in an energy minimization framework composed of unary and pairwise potentials. While computationally tractable discrete optimizers such as Graph Cuts and belief propagation (BP) exist for multi-label discrete problems, they still face prohibitively high computational challenges when the labels reside in a huge or very densely sampled space. Integrating key ideas from PatchMatch of effective particle propagation and resampling, PatchMatch belief propagation (PMBP) has been demonstrated to have good performance in addressing continuous labeling problems and runs orders of magnitude faster than Particle BP (PBP). However, the quality of the PMBP solution is tightly coupled with the local window size, over which the raw data cost is aggregated to mitigate ambiguity in the data constraint. This dependency heavily influences the overall complexity, increasing linearly with the window size. This paper proposes a novel algorithm called sped-up PMBP (SPM-BP) to tackle this critical computational bottleneck and speeds up PMBP by 50-100 times. The crux of SPM-BP is on unifying efficient filter-based cost aggregation and message passing with PatchMatch-based particle generation in a highly effective way. Though simple in its formulation, SPM-BP achieves superior performance for sub-pixel accurate stereo and optical-flow on benchmark datasets when compared with more complex and task-specific approaches.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"40 1","pages":"4006-4014"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87904923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
Blur-Aware Disparity Estimation from Defocus Stereo Images 离焦立体图像的模糊感知视差估计
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.104
Ching-Hui Chen, Hui Zhou, T. Ahonen
Defocus blur usually causes performance degradation in establishing the visual correspondence between stereo images. We propose a blur-aware disparity estimation method that is robust to the mismatch of focus in stereo images. The relative blur resulting from the mismatch of focus between stereo images is approximated as the difference of the square diameters of the blur kernels. Based on the defocus and stereo model, we propose the relative blur versus disparity (RBD) model that characterizes the relative blur as a second-order polynomial function of disparity. Our method alternates between RBD model update and disparity update in each iteration. The RBD model in return refines the disparity estimation by updating the matching cost and aggregation weight to compensate the mismatch of focus. Experiments using both synthesized and real datasets demonstrate the effectiveness of our proposed algorithm.
散焦模糊通常会导致在立体图像之间建立视觉一致性的性能下降。提出了一种对立体图像焦点不匹配具有鲁棒性的模糊感知视差估计方法。由立体图像之间焦点不匹配引起的相对模糊近似为模糊核的平方直径之差。在离焦和立体模型的基础上,提出了相对模糊-视差(RBD)模型,该模型将相对模糊表征为视差的二阶多项式函数。我们的方法在每次迭代中交替进行RBD模型更新和视差更新。RBD模型通过更新匹配代价和聚合权值来补偿焦点的不匹配,从而改进视差估计。在合成数据集和真实数据集上的实验证明了该算法的有效性。
{"title":"Blur-Aware Disparity Estimation from Defocus Stereo Images","authors":"Ching-Hui Chen, Hui Zhou, T. Ahonen","doi":"10.1109/ICCV.2015.104","DOIUrl":"https://doi.org/10.1109/ICCV.2015.104","url":null,"abstract":"Defocus blur usually causes performance degradation in establishing the visual correspondence between stereo images. We propose a blur-aware disparity estimation method that is robust to the mismatch of focus in stereo images. The relative blur resulting from the mismatch of focus between stereo images is approximated as the difference of the square diameters of the blur kernels. Based on the defocus and stereo model, we propose the relative blur versus disparity (RBD) model that characterizes the relative blur as a second-order polynomial function of disparity. Our method alternates between RBD model update and disparity update in each iteration. The RBD model in return refines the disparity estimation by updating the matching cost and aggregation weight to compensate the mismatch of focus. Experiments using both synthesized and real datasets demonstrate the effectiveness of our proposed algorithm.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"14 1","pages":"855-863"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86859526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A Unified Multiplicative Framework for Attribute Learning 属性学习的统一乘法框架
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.288
K. Liang, Hong-Yi Chang, S. Shan, Xilin Chen
Attributes are mid-level semantic properties of objects. Recent research has shown that visual attributes can benefit many traditional learning problems in computer vision community. However, attribute learning is still a challenging problem as the attributes may not always be predictable directly from input images and the variation of visual attributes is sometimes large across categories. In this paper, we propose a unified multiplicative framework for attribute learning, which tackles the key problems. Specifically, images and category information are jointly projected into a shared feature space, where the latent factors are disentangled and multiplied for attribute prediction. The resulting attribute classifier is category-specific instead of being shared by all categories. Moreover, our method can leverage auxiliary data to enhance the predictive ability of attribute classifiers, reducing the effort of instance-level attribute annotation to some extent. Experimental results show that our method achieves superior performance on both instance-level and category-level attribute prediction. For zero-shot learning based on attributes, our method significantly improves the state-of-the-art performance on AwA dataset and achieves comparable performance on CUB dataset.
属性是对象的中级语义属性。最近的研究表明,视觉属性可以使计算机视觉领域的许多传统学习问题受益。然而,属性学习仍然是一个具有挑战性的问题,因为属性可能并不总是可以直接从输入图像中预测,并且视觉属性的变化有时在不同类别之间很大。本文提出了一个统一的属性学习乘法框架,解决了其中的关键问题。具体而言,将图像和类别信息共同投影到一个共享的特征空间中,在该空间中对潜在因素进行解纠缠和相乘以进行属性预测。生成的属性分类器是特定于类别的,而不是由所有类别共享。此外,该方法还可以利用辅助数据增强属性分类器的预测能力,在一定程度上减少了实例级属性标注的工作量。实验结果表明,该方法在实例级和类别级属性预测上都取得了较好的效果。对于基于属性的零射击学习,我们的方法显著提高了AwA数据集的最先进性能,并且在CUB数据集上取得了相当的性能。
{"title":"A Unified Multiplicative Framework for Attribute Learning","authors":"K. Liang, Hong-Yi Chang, S. Shan, Xilin Chen","doi":"10.1109/ICCV.2015.288","DOIUrl":"https://doi.org/10.1109/ICCV.2015.288","url":null,"abstract":"Attributes are mid-level semantic properties of objects. Recent research has shown that visual attributes can benefit many traditional learning problems in computer vision community. However, attribute learning is still a challenging problem as the attributes may not always be predictable directly from input images and the variation of visual attributes is sometimes large across categories. In this paper, we propose a unified multiplicative framework for attribute learning, which tackles the key problems. Specifically, images and category information are jointly projected into a shared feature space, where the latent factors are disentangled and multiplied for attribute prediction. The resulting attribute classifier is category-specific instead of being shared by all categories. Moreover, our method can leverage auxiliary data to enhance the predictive ability of attribute classifiers, reducing the effort of instance-level attribute annotation to some extent. Experimental results show that our method achieves superior performance on both instance-level and category-level attribute prediction. For zero-shot learning based on attributes, our method significantly improves the state-of-the-art performance on AwA dataset and achieves comparable performance on CUB dataset.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"2506-2514"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89017803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Direct, Dense, and Deformable: Template-Based Non-rigid 3D Reconstruction from RGB Video 直接,密集和可变形:基于模板的RGB视频非刚性3D重建
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.111
Rui Yu, Chris Russell, N. Campbell, L. Agapito
In this paper we tackle the problem of capturing the dense, detailed 3D geometry of generic, complex non-rigid meshes using a single RGB-only commodity video camera and a direct approach. While robust and even real-time solutions exist to this problem if the observed scene is static, for non-rigid dense shape capture current systems are typically restricted to the use of complex multi-camera rigs, take advantage of the additional depth channel available in RGB-D cameras, or deal with specific shapes such as faces or planar surfaces. In contrast, our method makes use of a single RGB video as input, it can capture the deformations of generic shapes, and the depth estimation is dense, per-pixel and direct. We first compute a dense 3D template of the shape of the object, using a short rigid sequence, and subsequently perform online reconstruction of the non-rigid mesh as it evolves over time. Our energy optimization approach minimizes a robust photometric cost that simultaneously estimates the temporal correspondences and 3D deformations with respect to the template mesh. In our experimental evaluation we show a range of qualitative results on novel datasets, we compare against an existing method that requires multi-frame optical flow, and perform a quantitative evaluation against other template-based approaches on a ground truth dataset.
在本文中,我们解决了使用单一的rgb商用摄像机和直接方法捕获通用的、复杂的非刚性网格的密集、详细的3D几何形状的问题。如果观察到的场景是静态的,则存在健壮甚至实时的解决方案,但对于非刚性密集形状捕获,当前系统通常仅限于使用复杂的多相机平台,利用RGB-D相机中可用的额外深度通道,或处理特定形状,如面或平面。相比之下,我们的方法使用单个RGB视频作为输入,它可以捕获一般形状的变形,并且深度估计是密集的,每像素和直接的。我们首先使用短刚性序列计算对象形状的密集3D模板,然后随着时间的推移对非刚性网格进行在线重建。我们的能量优化方法最大限度地减少了稳健的光度成本,同时估计了相对于模板网格的时间对应和3D变形。在我们的实验评估中,我们在新数据集上展示了一系列定性结果,我们与需要多帧光流的现有方法进行了比较,并在地面真实数据集上对其他基于模板的方法进行了定量评估。
{"title":"Direct, Dense, and Deformable: Template-Based Non-rigid 3D Reconstruction from RGB Video","authors":"Rui Yu, Chris Russell, N. Campbell, L. Agapito","doi":"10.1109/ICCV.2015.111","DOIUrl":"https://doi.org/10.1109/ICCV.2015.111","url":null,"abstract":"In this paper we tackle the problem of capturing the dense, detailed 3D geometry of generic, complex non-rigid meshes using a single RGB-only commodity video camera and a direct approach. While robust and even real-time solutions exist to this problem if the observed scene is static, for non-rigid dense shape capture current systems are typically restricted to the use of complex multi-camera rigs, take advantage of the additional depth channel available in RGB-D cameras, or deal with specific shapes such as faces or planar surfaces. In contrast, our method makes use of a single RGB video as input, it can capture the deformations of generic shapes, and the depth estimation is dense, per-pixel and direct. We first compute a dense 3D template of the shape of the object, using a short rigid sequence, and subsequently perform online reconstruction of the non-rigid mesh as it evolves over time. Our energy optimization approach minimizes a robust photometric cost that simultaneously estimates the temporal correspondences and 3D deformations with respect to the template mesh. In our experimental evaluation we show a range of qualitative results on novel datasets, we compare against an existing method that requires multi-frame optical flow, and perform a quantitative evaluation against other template-based approaches on a ground truth dataset.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"39 1","pages":"918-926"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83581168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
Visual Phrases for Exemplar Face Detection 用于范例人脸检测的视觉短语
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.231
Vijay Kumar, A. Namboodiri, C. V. Jawahar
Recently, exemplar based approaches have been successfully applied for face detection in the wild. Contrary to traditional approaches that model face variations from a large and diverse set of training examples, exemplar-based approaches use a collection of discriminatively trained exemplars for detection. In this paradigm, each exemplar casts a vote using retrieval framework and generalized Hough voting, to locate the faces in the target image. The advantage of this approach is that by having a large database that covers all possible variations, faces in challenging conditions can be detected without having to learn explicit models for different variations. Current schemes, however, make an assumption of independence between the visual words, ignoring their relations in the process. They also ignore the spatial consistency of the visual words. Consequently, every exemplar word contributes equally during voting regardless of its location. In this paper, we propose a novel approach that incorporates higher order information in the voting process. We discover visual phrases that contain semantically related visual words and exploit them for detection along with the visual words. For spatial consistency, we estimate the spatial distribution of visual words and phrases from the entire database and then weigh their occurrence in exemplars. This ensures that a visual word or a phrase in an exemplar makes a major contribution only if it occurs at its semantic location, thereby suppressing the noise significantly. We perform extensive experiments on standard FDDB, AFW and G-album datasets and show significant improvement over previous exemplar approaches.
近年来,基于样本的方法已经成功地应用于野外人脸检测。与传统方法相反,基于样本的方法使用一组判别训练样本进行检测。在此范例中,每个范例使用检索框架和广义霍夫投票进行投票,以定位目标图像中的人脸。这种方法的优点是,通过拥有一个涵盖所有可能变化的大型数据库,可以在不需要学习不同变化的显式模型的情况下检测到具有挑战性的条件下的面部。然而,目前的方案假设视觉词之间是独立的,忽略了它们在这个过程中的关系。他们也忽略了视觉文字的空间一致性。因此,每个范例词在投票时的贡献是平等的,无论其位置如何。在本文中,我们提出了一种在投票过程中加入高阶信息的新方法。我们发现包含语义相关的视觉词的视觉短语,并利用它们与视觉词一起进行检测。为了空间一致性,我们估计了整个数据库中视觉词和短语的空间分布,然后权衡它们在样本中的出现次数。这确保了范例中的视觉单词或短语只有在其语义位置出现时才会做出重大贡献,从而显著抑制噪声。我们在标准的FDDB, AFW和G-album数据集上进行了广泛的实验,并显示出比以前的范例方法有显着改进。
{"title":"Visual Phrases for Exemplar Face Detection","authors":"Vijay Kumar, A. Namboodiri, C. V. Jawahar","doi":"10.1109/ICCV.2015.231","DOIUrl":"https://doi.org/10.1109/ICCV.2015.231","url":null,"abstract":"Recently, exemplar based approaches have been successfully applied for face detection in the wild. Contrary to traditional approaches that model face variations from a large and diverse set of training examples, exemplar-based approaches use a collection of discriminatively trained exemplars for detection. In this paradigm, each exemplar casts a vote using retrieval framework and generalized Hough voting, to locate the faces in the target image. The advantage of this approach is that by having a large database that covers all possible variations, faces in challenging conditions can be detected without having to learn explicit models for different variations. Current schemes, however, make an assumption of independence between the visual words, ignoring their relations in the process. They also ignore the spatial consistency of the visual words. Consequently, every exemplar word contributes equally during voting regardless of its location. In this paper, we propose a novel approach that incorporates higher order information in the voting process. We discover visual phrases that contain semantically related visual words and exploit them for detection along with the visual words. For spatial consistency, we estimate the spatial distribution of visual words and phrases from the entire database and then weigh their occurrence in exemplars. This ensures that a visual word or a phrase in an exemplar makes a major contribution only if it occurs at its semantic location, thereby suppressing the noise significantly. We perform extensive experiments on standard FDDB, AFW and G-album datasets and show significant improvement over previous exemplar approaches.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"54 1","pages":"1994-2002"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76195154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Multiple-Hypothesis Affine Region Estimation with Anisotropic LoG Filters 各向异性日志滤波器的多假设仿射区域估计
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.74
Takahiro Hasegawa, Mitsuru Ambai, Kohta Ishikawa, G. Koutaki, Yuji Yamauchi, Takayoshi Yamashita, H. Fujiyoshi
We propose a method for estimating multiple-hypothesis affine regions from a keypoint by using an anisotropic Laplacian-of-Gaussian (LoG) filter. Although conventional affine region detectors, such as Hessian/Harris-Affine, iterate to find an affine region that fits a given image patch, such iterative searching is adversely affected by an initial point. To avoid this problem, we allow multiple detections from a single keypoint. We demonstrate that the responses of all possible anisotropic LoG filters can be efficiently computed by factorizing them in a similar manner to spectral SIFT. A large number of LoG filters that are densely sampled in a parameter space are reconstructed by a weighted combination of a limited number of representative filters, called "eigenfilters", by using singular value decomposition. Also, the reconstructed filter responses of the sampled parameters can be interpolated to a continuous representation by using a series of proper functions. This results in efficient multiple extrema searching in a continuous space. Experiments revealed that our method has higher repeatability than the conventional methods.
我们提出了一种利用各向异性拉普拉斯-高斯(LoG)滤波器从一个关键点估计多重假设仿射区域的方法。虽然传统的仿射区域检测器,如Hessian/Harris-Affine,迭代找到一个适合给定图像补丁的仿射区域,但这种迭代搜索受到初始点的不利影响。为了避免这个问题,我们允许从一个关键点进行多个检测。我们证明了所有可能的各向异性LoG滤波器的响应都可以通过类似于谱SIFT的方式进行分解来有效地计算。在参数空间中密集采样的大量LoG滤波器通过使用奇异值分解将有限数量的代表性滤波器(称为“特征滤波器”)加权组合来重建。此外,重构后的采样参数的滤波响应可以用一系列固有函数插值成连续表示。这种方法在连续空间中实现了高效的多重极值搜索。实验结果表明,该方法具有较高的重复性。
{"title":"Multiple-Hypothesis Affine Region Estimation with Anisotropic LoG Filters","authors":"Takahiro Hasegawa, Mitsuru Ambai, Kohta Ishikawa, G. Koutaki, Yuji Yamauchi, Takayoshi Yamashita, H. Fujiyoshi","doi":"10.1109/ICCV.2015.74","DOIUrl":"https://doi.org/10.1109/ICCV.2015.74","url":null,"abstract":"We propose a method for estimating multiple-hypothesis affine regions from a keypoint by using an anisotropic Laplacian-of-Gaussian (LoG) filter. Although conventional affine region detectors, such as Hessian/Harris-Affine, iterate to find an affine region that fits a given image patch, such iterative searching is adversely affected by an initial point. To avoid this problem, we allow multiple detections from a single keypoint. We demonstrate that the responses of all possible anisotropic LoG filters can be efficiently computed by factorizing them in a similar manner to spectral SIFT. A large number of LoG filters that are densely sampled in a parameter space are reconstructed by a weighted combination of a limited number of representative filters, called \"eigenfilters\", by using singular value decomposition. Also, the reconstructed filter responses of the sampled parameters can be interpolated to a continuous representation by using a series of proper functions. This results in efficient multiple extrema searching in a continuous space. Experiments revealed that our method has higher repeatability than the conventional methods.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"389 1","pages":"585-593"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79572711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Joint Probabilistic Data Association Revisited 再论联合概率数据关联
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.349
S. H. Rezatofighi, Anton Milan, Zhen Zhang, Javen Qinfeng Shi, A. Dick, I. Reid
In this paper, we revisit the joint probabilistic data association (JPDA) technique and propose a novel solution based on recent developments in finding the m-best solutions to an integer linear program. The key advantage of this approach is that it makes JPDA computationally tractable in applications with high target and/or clutter density, such as spot tracking in fluorescence microscopy sequences and pedestrian tracking in surveillance footage. We also show that our JPDA algorithm embedded in a simple tracking framework is surprisingly competitive with state-of-the-art global tracking methods in these two applications, while needing considerably less processing time.
在本文中,我们回顾了联合概率数据关联(JPDA)技术,并基于寻找整数线性规划的m-最优解的最新进展提出了一种新的解决方案。这种方法的主要优点是,它使JPDA在高目标和/或杂波密度的应用中计算易于处理,例如荧光显微镜序列中的点跟踪和监控录像中的行人跟踪。我们还表明,在这两个应用程序中,嵌入在简单跟踪框架中的JPDA算法与最先进的全局跟踪方法具有惊人的竞争力,同时需要的处理时间大大减少。
{"title":"Joint Probabilistic Data Association Revisited","authors":"S. H. Rezatofighi, Anton Milan, Zhen Zhang, Javen Qinfeng Shi, A. Dick, I. Reid","doi":"10.1109/ICCV.2015.349","DOIUrl":"https://doi.org/10.1109/ICCV.2015.349","url":null,"abstract":"In this paper, we revisit the joint probabilistic data association (JPDA) technique and propose a novel solution based on recent developments in finding the m-best solutions to an integer linear program. The key advantage of this approach is that it makes JPDA computationally tractable in applications with high target and/or clutter density, such as spot tracking in fluorescence microscopy sequences and pedestrian tracking in surveillance footage. We also show that our JPDA algorithm embedded in a simple tracking framework is surprisingly competitive with state-of-the-art global tracking methods in these two applications, while needing considerably less processing time.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"3 5 1","pages":"3047-3055"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88329349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 309
Probabilistic Appearance Models for Segmentation and Classification 用于分割和分类的概率外观模型
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.198
J. Krüger, J. Ehrhardt, H. Handels
Statistical shape and appearance models are often based on the accurate identification of one-to-one correspondences in a training data set. At the same time, the determination of these corresponding landmarks is the most challenging part of such methods. Hufnagel etal developed an alternative method using correspondence probabilities for a statistical shape model. We propose the use of probabilistic correspondences for statistical appearance models by incorporating appearance information into the framework. A point-based representation is employed representing the image by a set of vectors assembling position and appearances. Using probabilistic correspondences between these multi-dimensional feature vectors eliminates the need for extensive preprocessing to find corresponding landmarks and reduces the dependence of the generated model on the landmark positions. Then, a maximum a-posteriori approach is used to derive a single global optimization criterion with respect to model parameters and observation dependent parameters, that directly affects shape and appearance information of the considered structures. Model generation and fitting can be expressed by optimizing the same criterion. The developed framework describes the modeling process in a concise and flexible mathematical way and allows for additional constraints as topological regularity in the modeling process. Furthermore, it eliminates the demand for costly correspondence determination. We apply the model for segmentation and landmark identification in hand X-ray images, where segmentation information is modeled as further features in the vectorial image representation. The results demonstrate the feasibility of the model to reconstruct contours and landmarks for unseen test images. Furthermore, we apply the model for tissue classification, where a model is generated for healthy brain tissue using 2D MRI slices. Applying the model to images of stroke patients the probabilistic correspondences are used to classify between healthy and pathological structures. The results demonstrate the ability of the probabilistic model to recognize healthy and pathological tissue automatically.
统计形状和外观模型通常基于对训练数据集中一对一对应关系的准确识别。同时,这些相应地标的确定是这些方法中最具挑战性的部分。Hufnagel等人开发了一种使用统计形状模型对应概率的替代方法。我们建议通过将外观信息纳入框架,使用概率对应的统计外观模型。采用基于点的表示,用一组集合了位置和外观的向量来表示图像。利用这些多维特征向量之间的概率对应关系,无需进行大量的预处理来寻找相应的地标,并减少了生成的模型对地标位置的依赖。然后,使用最大后验方法推导出一个关于模型参数和观测相关参数的单一全局优化准则,该准则直接影响所考虑结构的形状和外观信息。模型生成和拟合可以用优化同一准则来表示。开发的框架以简洁灵活的数学方式描述建模过程,并允许在建模过程中作为拓扑规则的附加约束。此外,它消除了对昂贵的通信确定的需求。我们将该模型应用于手部x射线图像的分割和地标识别,其中分割信息被建模为矢量图像表示中的进一步特征。实验结果表明,该模型对未见过的测试图像进行轮廓和地标重建是可行的。此外,我们将该模型应用于组织分类,其中使用二维MRI切片为健康脑组织生成模型。将该模型应用于脑卒中患者图像,利用概率对应关系对健康和病理结构进行分类。结果表明,该概率模型具有自动识别健康组织和病理组织的能力。
{"title":"Probabilistic Appearance Models for Segmentation and Classification","authors":"J. Krüger, J. Ehrhardt, H. Handels","doi":"10.1109/ICCV.2015.198","DOIUrl":"https://doi.org/10.1109/ICCV.2015.198","url":null,"abstract":"Statistical shape and appearance models are often based on the accurate identification of one-to-one correspondences in a training data set. At the same time, the determination of these corresponding landmarks is the most challenging part of such methods. Hufnagel etal developed an alternative method using correspondence probabilities for a statistical shape model. We propose the use of probabilistic correspondences for statistical appearance models by incorporating appearance information into the framework. A point-based representation is employed representing the image by a set of vectors assembling position and appearances. Using probabilistic correspondences between these multi-dimensional feature vectors eliminates the need for extensive preprocessing to find corresponding landmarks and reduces the dependence of the generated model on the landmark positions. Then, a maximum a-posteriori approach is used to derive a single global optimization criterion with respect to model parameters and observation dependent parameters, that directly affects shape and appearance information of the considered structures. Model generation and fitting can be expressed by optimizing the same criterion. The developed framework describes the modeling process in a concise and flexible mathematical way and allows for additional constraints as topological regularity in the modeling process. Furthermore, it eliminates the demand for costly correspondence determination. We apply the model for segmentation and landmark identification in hand X-ray images, where segmentation information is modeled as further features in the vectorial image representation. The results demonstrate the feasibility of the model to reconstruct contours and landmarks for unseen test images. Furthermore, we apply the model for tissue classification, where a model is generated for healthy brain tissue using 2D MRI slices. Applying the model to images of stroke patients the probabilistic correspondences are used to classify between healthy and pathological structures. The results demonstrate the ability of the probabilistic model to recognize healthy and pathological tissue automatically.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"59 1","pages":"1698-1706"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88433502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2015 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1