首页 > 最新文献

2014 IEEE Conference on Computer Vision and Pattern Recognition最新文献

英文 中文
Multiview Shape and Reflectance from Natural Illumination 自然光照下的多视图形状和反射率
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.277
Geoffrey Oxholm, K. Nishino
The world is full of objects with complex reflectances, situated in complex illumination environments. Past work on full 3D geometry recovery, however, has tried to handle this complexity by framing it into simplistic models of reflectance (Lambetian, mirrored, or diffuse plus specular) or illumination (one or more point light sources). Though there has been some recent progress in directly utilizing such complexities for recovering a single view geometry, it is not clear how such single-view methods can be extended to reconstruct the full geometry. To this end, we derive a probabilistic geometry estimation method that fully exploits the rich signal embedded in complex appearance. Though each observation provides partial and unreliable information, we show how to estimate the reflectance responsible for the diverse appearance, and unite the orientation cues embedded in each observation to reconstruct the underlying geometry. We demonstrate the effectiveness of our method on synthetic and real-world objects. The results show that our method performs accurately across a wide range of real-world environments and reflectances that lies between the extremes that have been the focus of past work.
世界上充满了具有复杂反射率的物体,它们位于复杂的照明环境中。然而,过去在全3D几何恢复方面的工作,试图通过将其框架到简单的反射率模型(Lambetian,镜像,或漫射加镜面)或照明(一个或多个点光源)来处理这种复杂性。尽管最近在直接利用这种复杂性来恢复单视图几何图形方面取得了一些进展,但目前尚不清楚如何将这种单视图方法扩展到重建完整的几何图形。为此,我们推导了一种概率几何估计方法,充分利用了嵌入在复杂外观中的丰富信号。虽然每个观测提供了部分和不可靠的信息,但我们展示了如何估计造成不同外观的反射率,并将嵌入在每个观测中的方向线索统一起来,以重建底层几何结构。我们证明了我们的方法在合成和现实世界对象上的有效性。结果表明,我们的方法在广泛的现实环境和极端之间的反射率范围内表现准确,这些极端一直是过去工作的重点。
{"title":"Multiview Shape and Reflectance from Natural Illumination","authors":"Geoffrey Oxholm, K. Nishino","doi":"10.1109/CVPR.2014.277","DOIUrl":"https://doi.org/10.1109/CVPR.2014.277","url":null,"abstract":"The world is full of objects with complex reflectances, situated in complex illumination environments. Past work on full 3D geometry recovery, however, has tried to handle this complexity by framing it into simplistic models of reflectance (Lambetian, mirrored, or diffuse plus specular) or illumination (one or more point light sources). Though there has been some recent progress in directly utilizing such complexities for recovering a single view geometry, it is not clear how such single-view methods can be extended to reconstruct the full geometry. To this end, we derive a probabilistic geometry estimation method that fully exploits the rich signal embedded in complex appearance. Though each observation provides partial and unreliable information, we show how to estimate the reflectance responsible for the diverse appearance, and unite the orientation cues embedded in each observation to reconstruct the underlying geometry. We demonstrate the effectiveness of our method on synthetic and real-world objects. The results show that our method performs accurately across a wide range of real-world environments and reflectances that lies between the extremes that have been the focus of past work.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125093700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Multimodal Learning in Loosely-Organized Web Images 松散组织的网络图像中的多模式学习
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.316
Kun Duan, David J. Crandall, Dhruv Batra
Photo-sharing websites have become very popular in the last few years, leading to huge collections of online images. In addition to image data, these websites collect a variety of multimodal metadata about photos including text tags, captions, GPS coordinates, camera metadata, user profiles, etc. However, this metadata is not well constrained and is often noisy, sparse, or missing altogether. In this paper, we propose a framework to model these "loosely organized" multimodal datasets, and show how to perform loosely-supervised learning using a novel latent Conditional Random Field framework. We learn parameters of the LCRF automatically from a small set of validation data, using Information Theoretic Metric Learning (ITML) to learn distance functions and a structural SVM formulation to learn the potential functions. We apply our framework on four datasets of images from Flickr, evaluating both qualitatively and quantitatively against several baselines.
照片分享网站在过去几年变得非常流行,导致大量的在线图片。除了图像数据,这些网站还收集各种关于照片的多模态元数据,包括文本标签、字幕、GPS坐标、相机元数据、用户资料等。然而,这些元数据没有得到很好的约束,并且经常是嘈杂的、稀疏的或完全缺失的。在本文中,我们提出了一个框架来对这些“松散组织”的多模态数据集进行建模,并展示了如何使用一种新的潜在条件随机场框架来执行松散监督学习。我们从一小部分验证数据中自动学习LCRF的参数,使用信息理论度量学习(ITML)来学习距离函数,使用结构支持向量机公式来学习势函数。我们将我们的框架应用于来自Flickr的四个图像数据集,对几个基线进行定性和定量评估。
{"title":"Multimodal Learning in Loosely-Organized Web Images","authors":"Kun Duan, David J. Crandall, Dhruv Batra","doi":"10.1109/CVPR.2014.316","DOIUrl":"https://doi.org/10.1109/CVPR.2014.316","url":null,"abstract":"Photo-sharing websites have become very popular in the last few years, leading to huge collections of online images. In addition to image data, these websites collect a variety of multimodal metadata about photos including text tags, captions, GPS coordinates, camera metadata, user profiles, etc. However, this metadata is not well constrained and is often noisy, sparse, or missing altogether. In this paper, we propose a framework to model these \"loosely organized\" multimodal datasets, and show how to perform loosely-supervised learning using a novel latent Conditional Random Field framework. We learn parameters of the LCRF automatically from a small set of validation data, using Information Theoretic Metric Learning (ITML) to learn distance functions and a structural SVM formulation to learn the potential functions. We apply our framework on four datasets of images from Flickr, evaluating both qualitatively and quantitatively against several baselines.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"72 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123542418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Looking Beyond the Visible Scene 超越可见场景
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.474
A. Khosla, Byoungkwon An, Joseph J. Lim, A. Torralba
A common thread that ties together many prior works in scene understanding is their focus on the aspects directly present in a scene such as its categorical classification or the set of objects. In this work, we propose to look beyond the visible elements of a scene; we demonstrate that a scene is not just a collection of objects and their configuration or the labels assigned to its pixels - it is so much more. From a simple observation of a scene, we can tell a lot about the environment surrounding the scene such as the potential establishments near it, the potential crime rate in the area, or even the economic climate. Here, we explore several of these aspects from both the human perception and computer vision perspective. Specifically, we show that it is possible to predict the distance of surrounding establishments such as McDonald's or hospitals even by using scenes located far from them. We go a step further to show that both humans and computers perform well at navigating the environment based only on visual cues from scenes. Lastly, we show that it is possible to predict the crime rates in an area simply by looking at a scene without any real-time criminal activity. Simply put, here, we illustrate that it is possible to look beyond the visible scene.
将许多先前的场景理解工作联系在一起的一个共同线索是,他们关注场景中直接存在的方面,如其分类或对象集。在这项工作中,我们建议超越场景的可见元素;我们证明了一个场景不仅仅是对象及其配置或分配给其像素的标签的集合-它是如此之多。通过对现场的简单观察,我们可以了解现场周围的环境,例如附近的潜在场所,该地区的潜在犯罪率,甚至是经济气候。在这里,我们从人类感知和计算机视觉的角度探讨了其中的几个方面。具体来说,我们表明,即使使用远离麦当劳或医院的场景,也可以预测周围场所的距离。我们进一步表明,人类和计算机在仅基于场景的视觉线索导航环境方面表现良好。最后,我们表明,在没有任何实时犯罪活动的情况下,仅仅通过观察现场就可以预测一个地区的犯罪率。简单地说,在这里,我们说明了超越可见场景的观察是可能的。
{"title":"Looking Beyond the Visible Scene","authors":"A. Khosla, Byoungkwon An, Joseph J. Lim, A. Torralba","doi":"10.1109/CVPR.2014.474","DOIUrl":"https://doi.org/10.1109/CVPR.2014.474","url":null,"abstract":"A common thread that ties together many prior works in scene understanding is their focus on the aspects directly present in a scene such as its categorical classification or the set of objects. In this work, we propose to look beyond the visible elements of a scene; we demonstrate that a scene is not just a collection of objects and their configuration or the labels assigned to its pixels - it is so much more. From a simple observation of a scene, we can tell a lot about the environment surrounding the scene such as the potential establishments near it, the potential crime rate in the area, or even the economic climate. Here, we explore several of these aspects from both the human perception and computer vision perspective. Specifically, we show that it is possible to predict the distance of surrounding establishments such as McDonald's or hospitals even by using scenes located far from them. We go a step further to show that both humans and computers perform well at navigating the environment based only on visual cues from scenes. Lastly, we show that it is possible to predict the crime rates in an area simply by looking at a scene without any real-time criminal activity. Simply put, here, we illustrate that it is possible to look beyond the visible scene.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123794460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
Automatic Face Reenactment 自动面部再现
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.537
Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormählen, P. Pérez, C. Theobalt
We propose an image-based, facial reenactment system that replaces the face of an actor in an existing target video with the face of a user from a source video, while preserving the original target performance. Our system is fully automatic and does not require a database of source expressions. Instead, it is able to produce convincing reenactment results from a short source video captured with an off-the-shelf camera, such as a webcam, where the user performs arbitrary facial gestures. Our reenactment pipeline is conceived as part image retrieval and part face transfer: The image retrieval is based on temporal clustering of target frames and a novel image matching metric that combines appearance and motion to select candidate frames from the source video, while the face transfer uses a 2D warping strategy that preserves the user's identity. Our system excels in simplicity as it does not rely on a 3D face model, it is robust under head motion and does not require the source and target performance to be similar. We show convincing reenactment results for videos that we recorded ourselves and for low-quality footage taken from the Internet.
我们提出了一种基于图像的面部再现系统,该系统将现有目标视频中的演员面部替换为源视频中的用户面部,同时保留原始目标表演。我们的系统是全自动的,不需要源表达式的数据库。相反,它能够从一个现成的摄像头(比如网络摄像头)拍摄的短视频中产生令人信服的再现结果,在这个视频中,用户可以做出任意的面部手势。我们的再现管道被认为是部分图像检索和部分面部转移:图像检索基于目标帧的时间聚类和一种新的图像匹配度量,该度量结合了外观和运动来从源视频中选择候选帧,而面部转移使用2D扭曲策略来保留用户的身份。我们的系统在简单性方面表现出色,因为它不依赖于3D面部模型,它在头部运动下具有鲁棒性,并且不需要源和目标性能相似。对于我们自己录制的视频和从互联网上获取的低质量镜头,我们展示了令人信服的再现结果。
{"title":"Automatic Face Reenactment","authors":"Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormählen, P. Pérez, C. Theobalt","doi":"10.1109/CVPR.2014.537","DOIUrl":"https://doi.org/10.1109/CVPR.2014.537","url":null,"abstract":"We propose an image-based, facial reenactment system that replaces the face of an actor in an existing target video with the face of a user from a source video, while preserving the original target performance. Our system is fully automatic and does not require a database of source expressions. Instead, it is able to produce convincing reenactment results from a short source video captured with an off-the-shelf camera, such as a webcam, where the user performs arbitrary facial gestures. Our reenactment pipeline is conceived as part image retrieval and part face transfer: The image retrieval is based on temporal clustering of target frames and a novel image matching metric that combines appearance and motion to select candidate frames from the source video, while the face transfer uses a 2D warping strategy that preserves the user's identity. Our system excels in simplicity as it does not rely on a 3D face model, it is robust under head motion and does not require the source and target performance to be similar. We show convincing reenactment results for videos that we recorded ourselves and for low-quality footage taken from the Internet.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125569146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 157
Noising versus Smoothing for Vertex Identification in Unknown Shapes 噪声与平滑在未知形状顶点识别中的应用
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.530
Konstantinos A. Raftopoulos, Marin Ferecatu
A method for identifying shape features of local nature on the shape's boundary, in a way that is facilitated by the presence of noise is presented. The boundary is seen as a real function. A study of a certain distance function reveals, almost counter-intuitively, that vertices can be defined and localized better in the presence of noise, thus the concept of noising, as opposed to smoothing, is conceived and presented. The method works on both smooth and noisy shapes, the presence of noise having an effect of improving on the results of the smoothed version. Experiments with noise and a comparison to state of the art validate the method.
提出了一种在形状边界上识别局部形状特征的方法,这种方法是由噪声的存在促进的。边界被看作是一个实函数。对某个距离函数的研究表明,几乎与直觉相反,在存在噪声的情况下,顶点可以更好地定义和定位,因此,与平滑相反,噪声的概念被构思和提出。该方法适用于光滑和有噪声的形状,噪声的存在对光滑版本的结果有改善的影响。噪声实验和与技术状态的比较验证了该方法。
{"title":"Noising versus Smoothing for Vertex Identification in Unknown Shapes","authors":"Konstantinos A. Raftopoulos, Marin Ferecatu","doi":"10.1109/CVPR.2014.530","DOIUrl":"https://doi.org/10.1109/CVPR.2014.530","url":null,"abstract":"A method for identifying shape features of local nature on the shape's boundary, in a way that is facilitated by the presence of noise is presented. The boundary is seen as a real function. A study of a certain distance function reveals, almost counter-intuitively, that vertices can be defined and localized better in the presence of noise, thus the concept of noising, as opposed to smoothing, is conceived and presented. The method works on both smooth and noisy shapes, the presence of noise having an effect of improving on the results of the smoothed version. Experiments with noise and a comparison to state of the art validate the method.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126856832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Partial Occlusion Handling for Visual Tracking via Robust Part Matching 基于鲁棒部分匹配的视觉跟踪局部遮挡处理
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.164
Tianzhu Zhang, K. Jia, Changsheng Xu, Yi Ma, N. Ahuja
Part-based visual tracking is advantageous due to its robustness against partial occlusion. However, how to effectively exploit the confidence scores of individual parts to construct a robust tracker is still a challenging problem. In this paper, we address this problem by simultaneously matching parts in each of multiple frames, which is realized by a locality-constrained low-rank sparse learning method that establishes multi-frame part correspondences through optimization of partial permutation matrices. The proposed part matching tracker (PMT) has a number of attractive properties. (1) It exploits the spatial-temporal locality-constrained property for robust part matching. (2) It matches local parts from multiple frames jointly by considering their low-rank and sparse structure information, which can effectively handle part appearance variations due to occlusion or noise. (3) The proposed PMT model has the inbuilt mechanism of leveraging multi-mode target templates, so that the dilemma of template updating when encountering occlusion in tracking can be better handled. This contrasts with existing methods that only do part matching between a pair of frames. We evaluate PMT and compare with 10 popular state-of-the-art methods on challenging benchmarks. Experimental results show that PMT consistently outperform these existing trackers.
基于局部的视觉跟踪由于其对局部遮挡的鲁棒性而具有优势。然而,如何有效地利用单个零件的置信度分数来构建鲁棒跟踪器仍然是一个具有挑战性的问题。在本文中,我们通过局部排列矩阵的优化建立多帧部分对应关系的位置约束低秩稀疏学习方法来实现多帧各部分同时匹配。提出的零件匹配跟踪器(PMT)具有许多吸引人的特性。(1)利用时空位置约束特性进行鲁棒零件匹配。(2)结合多帧局部局部的低秩稀疏结构信息,对局部局部局部进行联合匹配,能有效处理遮挡或噪声引起的局部局部外观变化。(3)提出的PMT模型内置了利用多模式目标模板的机制,可以更好地处理跟踪中遇到遮挡时模板更新的困境。这与仅在一对帧之间进行部分匹配的现有方法形成了对比。我们评估PMT,并在具有挑战性的基准上与10种流行的最先进的方法进行比较。实验结果表明,PMT始终优于现有的跟踪器。
{"title":"Partial Occlusion Handling for Visual Tracking via Robust Part Matching","authors":"Tianzhu Zhang, K. Jia, Changsheng Xu, Yi Ma, N. Ahuja","doi":"10.1109/CVPR.2014.164","DOIUrl":"https://doi.org/10.1109/CVPR.2014.164","url":null,"abstract":"Part-based visual tracking is advantageous due to its robustness against partial occlusion. However, how to effectively exploit the confidence scores of individual parts to construct a robust tracker is still a challenging problem. In this paper, we address this problem by simultaneously matching parts in each of multiple frames, which is realized by a locality-constrained low-rank sparse learning method that establishes multi-frame part correspondences through optimization of partial permutation matrices. The proposed part matching tracker (PMT) has a number of attractive properties. (1) It exploits the spatial-temporal locality-constrained property for robust part matching. (2) It matches local parts from multiple frames jointly by considering their low-rank and sparse structure information, which can effectively handle part appearance variations due to occlusion or noise. (3) The proposed PMT model has the inbuilt mechanism of leveraging multi-mode target templates, so that the dilemma of template updating when encountering occlusion in tracking can be better handled. This contrasts with existing methods that only do part matching between a pair of frames. We evaluate PMT and compare with 10 popular state-of-the-art methods on challenging benchmarks. Experimental results show that PMT consistently outperform these existing trackers.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"539 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116245469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 117
Joint Depth Estimation and Camera Shake Removal from Single Blurry Image 单幅模糊图像的联合深度估计和相机抖动去除
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.370
Zhe Hu, Li Xu, Ming-Hsuan Yang
Camera shake during exposure time often results in spatially variant blur effect of the image. The non-uniform blur effect is not only caused by the camera motion, but also the depth variation of the scene. The objects close to the camera sensors are likely to appear more blurry than those at a distance in such cases. However, recent non-uniform deblurring methods do not explicitly consider the depth factor or assume fronto-parallel scenes with constant depth for simplicity. While single image non-uniform deblurring is a challenging problem, the blurry results in fact contain depth information which can be exploited. We propose to jointly estimate scene depth and remove non-uniform blur caused by camera motion by exploiting their underlying geometric relationships, with only single blurry image as input. To this end, we present a unified layer-based model for depth-involved deblurring. We provide a novel layer-based solution using matting to partition the layers and an expectation-maximization scheme to solve this problem. This approach largely reduces the number of unknowns and makes the problem tractable. Experiments on challenging examples demonstrate that both depth and camera shake removal can be well addressed within the unified framework.
曝光时的相机抖动往往会造成图像的空间模糊效果。不均匀的模糊效果不仅是由相机运动引起的,而且是由场景的深度变化引起的。在这种情况下,靠近相机传感器的物体可能比远处的物体显得更模糊。然而,最近的非均匀去模糊方法没有明确考虑深度因素,或者为了简单起见,假设深度恒定的正面平行场景。虽然单幅图像的非均匀去模糊是一个具有挑战性的问题,但模糊的结果实际上包含了可以利用的深度信息。我们建议联合估计场景深度,并通过利用它们的底层几何关系来消除由相机运动引起的不均匀模糊,只有单个模糊图像作为输入。为此,我们提出了一种统一的基于层的深度去模糊模型。我们提供了一种新颖的基于层的解决方案,使用抠图划分层和期望最大化方案来解决这个问题。这种方法在很大程度上减少了未知的数量,使问题易于处理。具有挑战性的实例实验表明,在统一的框架内可以很好地解决深度和相机抖动去除问题。
{"title":"Joint Depth Estimation and Camera Shake Removal from Single Blurry Image","authors":"Zhe Hu, Li Xu, Ming-Hsuan Yang","doi":"10.1109/CVPR.2014.370","DOIUrl":"https://doi.org/10.1109/CVPR.2014.370","url":null,"abstract":"Camera shake during exposure time often results in spatially variant blur effect of the image. The non-uniform blur effect is not only caused by the camera motion, but also the depth variation of the scene. The objects close to the camera sensors are likely to appear more blurry than those at a distance in such cases. However, recent non-uniform deblurring methods do not explicitly consider the depth factor or assume fronto-parallel scenes with constant depth for simplicity. While single image non-uniform deblurring is a challenging problem, the blurry results in fact contain depth information which can be exploited. We propose to jointly estimate scene depth and remove non-uniform blur caused by camera motion by exploiting their underlying geometric relationships, with only single blurry image as input. To this end, we present a unified layer-based model for depth-involved deblurring. We provide a novel layer-based solution using matting to partition the layers and an expectation-maximization scheme to solve this problem. This approach largely reduces the number of unknowns and makes the problem tractable. Experiments on challenging examples demonstrate that both depth and camera shake removal can be well addressed within the unified framework.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122852688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
Learning Non-linear Reconstruction Models for Image Set Classification 学习用于图像集分类的非线性重建模型
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.246
Munawar Hayat, Bennamoun, S. An
We propose a deep learning framework for image set classification with application to face recognition. An Adaptive Deep Network Template (ADNT) is defined whose parameters are initialized by performing unsupervised pre-training in a layer-wise fashion using Gaussian Restricted Boltzmann Machines (GRBMs). The pre-initialized ADNT is then separately trained for images of each class and class-specific models are learnt. Based on the minimum reconstruction error from the learnt class-specific models, a majority voting strategy is used for classification. The proposed framework is extensively evaluated for the task of image set classification based face recognition on Honda/UCSD, CMU Mobo, YouTube Celebrities and a Kinect dataset. Our experimental results and comparisons with existing state-of-the-art methods show that the proposed method consistently achieves the best performance on all these datasets.
我们提出了一种图像集分类的深度学习框架,并将其应用于人脸识别。定义了一个自适应深度网络模板(ADNT),其参数通过使用高斯受限玻尔兹曼机(grbm)以分层方式执行无监督预训练来初始化。然后针对每个类的图像分别训练预初始化的ADNT,并学习特定于类的模型。基于学习到的类特定模型的最小重构误差,采用多数投票策略进行分类。在本田/UCSD、CMU Mobo、YouTube Celebrities和Kinect数据集上,对基于图像集分类的人脸识别任务进行了广泛的评估。我们的实验结果和与现有最先进的方法的比较表明,所提出的方法在所有这些数据集上都能达到最佳性能。
{"title":"Learning Non-linear Reconstruction Models for Image Set Classification","authors":"Munawar Hayat, Bennamoun, S. An","doi":"10.1109/CVPR.2014.246","DOIUrl":"https://doi.org/10.1109/CVPR.2014.246","url":null,"abstract":"We propose a deep learning framework for image set classification with application to face recognition. An Adaptive Deep Network Template (ADNT) is defined whose parameters are initialized by performing unsupervised pre-training in a layer-wise fashion using Gaussian Restricted Boltzmann Machines (GRBMs). The pre-initialized ADNT is then separately trained for images of each class and class-specific models are learnt. Based on the minimum reconstruction error from the learnt class-specific models, a majority voting strategy is used for classification. The proposed framework is extensively evaluated for the task of image set classification based face recognition on Honda/UCSD, CMU Mobo, YouTube Celebrities and a Kinect dataset. Our experimental results and comparisons with existing state-of-the-art methods show that the proposed method consistently achieves the best performance on all these datasets.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116606524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
Calibrating a Non-isotropic Near Point Light Source Using a Plane 用平面标定非各向同性近点光源
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.290
Jaesik Park, Sudipta N. Sinha, Y. Matsushita, Yu-Wing Tai, In-So Kweon
We show that a non-isotropic near point light source rigidly attached to a camera can be calibrated using multiple images of a weakly textured planar scene. We prove that if the radiant intensity distribution (RID) of a light source is radially symmetric with respect to its dominant direction, then the shading observed on a Lambertian scene plane is bilaterally symmetric with respect to a 2D line on the plane. The symmetry axis detected in an image provides a linear constraint for estimating the dominant light axis. The light position and RID parameters can then be estimated using a linear method. Specular highlights if available can also be used for light position estimation. We also extend our method to handle non-Lambertian reflectances which we model using a biquadratic BRDF. We have evaluated our method on synthetic data quantitavely. Our experiments on real scenes show that our method works well in practice and enables light calibration without the need of a specialized hardware.
我们展示了一个非各向同性的近点光源,刚性地附着在相机上,可以使用弱纹理平面场景的多个图像进行校准。我们证明了如果光源的辐射强度分布(RID)相对于其主导方向是径向对称的,那么在朗伯场景平面上观察到的阴影相对于平面上的二维直线是双边对称的。在图像中检测到的对称轴为估计主光轴提供了线性约束。然后可以使用线性方法估计光源位置和RID参数。如果可用,镜面高光也可以用于光位置估计。我们还扩展了我们的方法来处理我们使用双二次BRDF建模的非朗伯反射率。我们用合成数据定量地评价了我们的方法。我们在真实场景中的实验表明,我们的方法在实践中效果良好,无需专门的硬件就可以实现光校准。
{"title":"Calibrating a Non-isotropic Near Point Light Source Using a Plane","authors":"Jaesik Park, Sudipta N. Sinha, Y. Matsushita, Yu-Wing Tai, In-So Kweon","doi":"10.1109/CVPR.2014.290","DOIUrl":"https://doi.org/10.1109/CVPR.2014.290","url":null,"abstract":"We show that a non-isotropic near point light source rigidly attached to a camera can be calibrated using multiple images of a weakly textured planar scene. We prove that if the radiant intensity distribution (RID) of a light source is radially symmetric with respect to its dominant direction, then the shading observed on a Lambertian scene plane is bilaterally symmetric with respect to a 2D line on the plane. The symmetry axis detected in an image provides a linear constraint for estimating the dominant light axis. The light position and RID parameters can then be estimated using a linear method. Specular highlights if available can also be used for light position estimation. We also extend our method to handle non-Lambertian reflectances which we model using a biquadratic BRDF. We have evaluated our method on synthetic data quantitavely. Our experiments on real scenes show that our method works well in practice and enables light calibration without the need of a specialized hardware.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117003511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Diversity-Enhanced Condensation Algorithm and Its Application for Robust and Accurate Endoscope Three-Dimensional Motion Tracking 多样性增强凝聚算法及其在内窥镜三维运动跟踪中的应用
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.163
Xióngbiao Luó, Ying Wan, Xiangjian He, Jie Yang, K. Mori
The paper proposes a diversity-enhanced condensation algorithm to address the particle impoverishment problem which stochastic filtering usually suffers from. The particle diversity plays an important role as it affects the performance of filtering. Although the condensation algorithm is widely used in computer vision, it easily gets trapped in local minima due to the particle degeneracy. We introduce a modified evolutionary computing method, adaptive differential evolution, to resolve the particle impoverishment under a proper size of particle population. We apply our proposed method to endoscope tracking for estimating three-dimensional motion of the endoscopic camera. The experimental results demonstrate that our proposed method offers more robust and accurate tracking than previous methods. The current tracking smoothness and error were significantly reduced from (3.7, 4.8) to (2.3 mm, 3.2 mm), which approximates the clinical requirement of 3.0 mm.
针对随机滤波中存在的粒子贫困化问题,提出了一种多样性增强凝聚算法。粒子多样性是影响过滤性能的重要因素。虽然凝聚算法在计算机视觉中得到了广泛的应用,但由于粒子的简并性,它很容易陷入局部极小值。提出了一种改进的进化计算方法——自适应差分进化,以解决在适当的粒子种群规模下的粒子贫困化问题。我们将该方法应用于内窥镜跟踪,用于估计内窥镜相机的三维运动。实验结果表明,该方法具有较好的鲁棒性和准确性。电流跟踪平滑度和误差从(3.7,4.8)显著降低到(2.3 mm, 3.2 mm),接近临床要求的3.0 mm。
{"title":"Diversity-Enhanced Condensation Algorithm and Its Application for Robust and Accurate Endoscope Three-Dimensional Motion Tracking","authors":"Xióngbiao Luó, Ying Wan, Xiangjian He, Jie Yang, K. Mori","doi":"10.1109/CVPR.2014.163","DOIUrl":"https://doi.org/10.1109/CVPR.2014.163","url":null,"abstract":"The paper proposes a diversity-enhanced condensation algorithm to address the particle impoverishment problem which stochastic filtering usually suffers from. The particle diversity plays an important role as it affects the performance of filtering. Although the condensation algorithm is widely used in computer vision, it easily gets trapped in local minima due to the particle degeneracy. We introduce a modified evolutionary computing method, adaptive differential evolution, to resolve the particle impoverishment under a proper size of particle population. We apply our proposed method to endoscope tracking for estimating three-dimensional motion of the endoscopic camera. The experimental results demonstrate that our proposed method offers more robust and accurate tracking than previous methods. The current tracking smoothness and error were significantly reduced from (3.7, 4.8) to (2.3 mm, 3.2 mm), which approximates the clinical requirement of 3.0 mm.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129545138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2014 IEEE Conference on Computer Vision and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1