2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文中文

Light Field Reconstruction Using Deep Convolutional Network on EPI 基于深度卷积网络的EPI光场重建

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.178

Gaochang Wu, Mandan Zhao, Liangyong Wang, Qionghai Dai, Tianyou Chai, Yebin Liu

In this paper, we take advantage of the clear texture structure of the epipolar plane image (EPI) in the light field data and model the problem of light field reconstruction from a sparse set of views as a CNN-based angular detail restoration on EPI. We indicate that one of the main challenges in sparsely sampled light field reconstruction is the information asymmetry between the spatial and angular domain, where the detail portion in the angular domain is damaged by undersampling. To balance the spatial and angular information, the spatial high frequency components of an EPI is removed using EPI blur, before feeding to the network. Finally, a non-blind deblur operation is used to recover the spatial detail suppressed by the EPI blur. We evaluate our approach on several datasets including synthetic scenes, real-world scenes and challenging microscope light field data. We demonstrate the high performance and robustness of the proposed framework compared with the state-of-the-arts algorithms. We also show a further application for depth enhancement by using the reconstructed light field.

本文利用光场数据中极平面图像(EPI)清晰的纹理结构，将稀疏视图集的光场重建问题建模为基于cnn的EPI角度细节恢复问题。我们指出，稀疏采样光场重建的主要挑战之一是空间域和角域之间的信息不对称，其中角域的细节部分被欠采样破坏。为了平衡空间和角度信息，在输入到网络之前，使用EPI模糊去除EPI的空间高频成分。最后，采用非盲去模糊操作恢复被EPI模糊抑制的空间细节。我们在几个数据集上评估了我们的方法，包括合成场景、真实场景和具有挑战性的显微镜光场数据。与最先进的算法相比，我们证明了所提出框架的高性能和鲁棒性。我们还展示了利用重建光场进行深度增强的进一步应用。

{"title":"Light Field Reconstruction Using Deep Convolutional Network on EPI","authors":"Gaochang Wu, Mandan Zhao, Liangyong Wang, Qionghai Dai, Tianyou Chai, Yebin Liu","doi":"10.1109/CVPR.2017.178","DOIUrl":"https://doi.org/10.1109/CVPR.2017.178","url":null,"abstract":"In this paper, we take advantage of the clear texture structure of the epipolar plane image (EPI) in the light field data and model the problem of light field reconstruction from a sparse set of views as a CNN-based angular detail restoration on EPI. We indicate that one of the main challenges in sparsely sampled light field reconstruction is the information asymmetry between the spatial and angular domain, where the detail portion in the angular domain is damaged by undersampling. To balance the spatial and angular information, the spatial high frequency components of an EPI is removed using EPI blur, before feeding to the network. Finally, a non-blind deblur operation is used to recover the spatial detail suppressed by the EPI blur. We evaluate our approach on several datasets including synthetic scenes, real-world scenes and challenging microscope light field data. We demonstrate the high performance and robustness of the proposed framework compared with the state-of-the-arts algorithms. We also show a further application for depth enhancement by using the reconstructed light field.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"250 1","pages":"1638-1646"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82915022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 178

Attentional Correlation Filter Network for Adaptive Visual Tracking 自适应视觉跟踪的注意相关滤波网络

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.513

Jongwon Choi, H. Chang, Sangdoo Yun, Tobias Fischer, Y. Demiris, J. Choi

We propose a new tracking framework with an attentional mechanism that chooses a subset of the associated correlation filters for increased robustness and computational efficiency. The subset of filters is adaptively selected by a deep attentional network according to the dynamic properties of the tracking target. Our contributions are manifold, and are summarised as follows: (i) Introducing the Attentional Correlation Filter Network which allows adaptive tracking of dynamic targets. (ii) Utilising an attentional network which shifts the attention to the best candidate modules, as well as predicting the estimated accuracy of currently inactive modules. (iii) Enlarging the variety of correlation filters which cover target drift, blurriness, occlusion, scale changes, and flexible aspect ratio. (iv) Validating the robustness and efficiency of the attentional mechanism for visual tracking through a number of experiments. Our method achieves similar performance to non real-time trackers, and state-of-the-art performance amongst real-time trackers.

我们提出了一种新的跟踪框架，该框架采用注意机制，选择相关滤波器的子集，以提高鲁棒性和计算效率。根据跟踪目标的动态特性，由深度注意网络自适应选择滤波器子集。我们的贡献是多方面的，总结如下:(i)引入了允许自适应跟踪动态目标的注意相关滤波网络。(ii)利用注意力网络将注意力转移到最佳候选模块，并预测当前不活跃模块的估计精度。(iii)扩大相关滤波器的种类，涵盖目标漂移、模糊、遮挡、尺度变化和灵活的宽高比。(iv)通过一系列实验验证视觉跟踪注意机制的稳健性和效率。我们的方法实现了与非实时跟踪器相似的性能，并且在实时跟踪器中具有最先进的性能。

引用次数: 291

Beyond Instance-Level Image Retrieval: Leveraging Captions to Learn a Global Visual Representation for Semantic Retrieval 超越实例级图像检索:利用标题学习语义检索的全局视觉表示

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.560

Albert Gordo, Diane Larlus

Querying with an example image is a simple and intuitive interface to retrieve information from a visual database. Most of the research in image retrieval has focused on the task of instance-level image retrieval, where the goal is to retrieve images that contain the same object instance as the query image. In this work we move beyond instance-level retrieval and consider the task of semantic image retrieval in complex scenes, where the goal is to retrieve images that share the same semantics as the query image. We show that, despite its subjective nature, the task of semantically ranking visual scenes is consistently implemented across a pool of human annotators. We also show that a similarity based on human-annotated region-level captions is highly correlated with the human ranking and constitutes a good computable surrogate. Following this observation, we learn a visual embedding of the images where the similarity in the visual space is correlated with their semantic similarity surrogate. We further extend our model to learn a joint embedding of visual and textual cues that allows one to query the database using a text modifier in addition to the query image, adapting the results to the modifier. Finally, our model can ground the ranking decisions by showing regions that contributed the most to the similarity between pairs of images, providing a visual explanation of the similarity.

使用示例图像查询是一个简单而直观的界面，用于从可视化数据库检索信息。大多数图像检索的研究都集中在实例级图像检索任务上，其目标是检索包含与查询图像相同的对象实例的图像。在这项工作中，我们超越了实例级检索，并考虑了复杂场景中的语义图像检索任务，其目标是检索与查询图像具有相同语义的图像。我们表明，尽管具有主观性，但对视觉场景进行语义排序的任务是在人类注释器池中一致实现的。我们还表明，基于人类注释的区域级标题的相似性与人类排名高度相关，并且构成了一个很好的可计算代理。在此观察之后，我们学习图像的视觉嵌入，其中视觉空间中的相似性与它们的语义相似性代理相关。我们进一步扩展了我们的模型，以学习视觉和文本线索的联合嵌入，这使得除了查询图像之外，还可以使用文本修饰符来查询数据库，使结果适应修饰符。最后，我们的模型可以通过显示对图像对相似性贡献最大的区域，提供相似性的视觉解释，从而为排名决策奠定基础。

{"title":"Beyond Instance-Level Image Retrieval: Leveraging Captions to Learn a Global Visual Representation for Semantic Retrieval","authors":"Albert Gordo, Diane Larlus","doi":"10.1109/CVPR.2017.560","DOIUrl":"https://doi.org/10.1109/CVPR.2017.560","url":null,"abstract":"Querying with an example image is a simple and intuitive interface to retrieve information from a visual database. Most of the research in image retrieval has focused on the task of instance-level image retrieval, where the goal is to retrieve images that contain the same object instance as the query image. In this work we move beyond instance-level retrieval and consider the task of semantic image retrieval in complex scenes, where the goal is to retrieve images that share the same semantics as the query image. We show that, despite its subjective nature, the task of semantically ranking visual scenes is consistently implemented across a pool of human annotators. We also show that a similarity based on human-annotated region-level captions is highly correlated with the human ranking and constitutes a good computable surrogate. Following this observation, we learn a visual embedding of the images where the similarity in the visual space is correlated with their semantic similarity surrogate. We further extend our model to learn a joint embedding of visual and textual cues that allows one to query the database using a text modifier in addition to the query image, adapting the results to the modifier. Finally, our model can ground the ranking decisions by showing regions that contributed the most to the similarity between pairs of images, providing a visual explanation of the similarity.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"20 1","pages":"5272-5281"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87263371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 81

Variational Autoencoded Regression: High Dimensional Regression of Visual Data on Complex Manifold 变分自编码回归:复杂流形上视觉数据的高维回归

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.314

Y. Yoo, Sangdoo Yun, H. Chang, Y. Demiris, J. Choi

This paper proposes a new high dimensional regression method by merging Gaussian process regression into a variational autoencoder framework. In contrast to other regression methods, the proposed method focuses on the case where output responses are on a complex high dimensional manifold, such as images. Our contributions are summarized as follows: (i) A new regression method estimating high dimensional image responses, which is not handled by existing regression algorithms, is proposed. (ii) The proposed regression method introduces a strategy to learn the latent space as well as the encoder and decoder so that the result of the regressed response in the latent space coincide with the corresponding response in the data space. (iii) The proposed regression is embedded into a generative model, and the whole procedure is developed by the variational autoencoder framework. We demonstrate the robustness and effectiveness of our method through a number of experiments on various visual data regression problems.

本文提出了一种新的高维回归方法，将高斯过程回归融合到变分自编码器框架中。与其他回归方法相比，所提出的方法侧重于输出响应在复杂的高维流形(如图像)上的情况。本文的主要工作如下:(1)提出了一种新的估计高维图像响应的回归方法，这是现有回归算法无法处理的。(ii)本文提出的回归方法引入了学习潜空间以及学习编码器和解码器的策略，使得潜空间中回归响应的结果与数据空间中相应的响应一致。(iii)提出的回归嵌入到生成模型中，整个过程由变分自编码器框架开发。我们通过对各种视觉数据回归问题的大量实验证明了我们的方法的鲁棒性和有效性。

引用次数: 22

Video Desnowing and Deraining Based on Matrix Decomposition 基于矩阵分解的视频去噪与去噪

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.303

Weihong Ren, Jiandong Tian, Zhi Han, Antoni B. Chan, Yandong Tang

The existing snow/rain removal methods often fail for heavy snow/rain and dynamic scene. One reason for the failure is due to the assumption that all the snowflakes/rain streaks are sparse in snow/rain scenes. The other is that the existing methods often can not differentiate moving objects and snowflakes/rain streaks. In this paper, we propose a model based on matrix decomposition for video desnowing and deraining to solve the problems mentioned above. We divide snowflakes/rain streaks into two categories: sparse ones and dense ones. With background fluctuations and optical flow information, the detection of moving objects and sparse snowflakes/rain streaks is formulated as a multi-label Markov Random Fields (MRFs). As for dense snowflakes/rain streaks, they are considered to obey Gaussian distribution. The snowflakes/rain streaks, including sparse ones and dense ones, in scene backgrounds are removed by low-rank representation of the backgrounds. Meanwhile, a group sparsity term in our model is designed to filter snow/rain pixels within the moving objects. Experimental results show that our proposed model performs better than the state-of-the-art methods for snow and rain removal.

现有的去除雪/雨的方法对于大雪/雨和动态场景往往失效。失败的一个原因是假设所有的雪花/雨条纹在雪/雨场景中都是稀疏的。二是现有的方法往往不能区分运动物体和雪花/雨条纹。针对上述问题，本文提出了一种基于矩阵分解的视频降噪降噪模型。我们把雪花/雨点分为两类:稀疏的和密集的。利用背景波动和光流信息，将运动物体和稀疏雪花/雨条纹的检测表述为多标签马尔科夫随机场(mrf)。对于密集的雪花/雨条，它们被认为服从高斯分布。场景背景中的雪花/雨纹，包括稀疏的和密集的，通过背景的低阶表示来去除。同时，在我们的模型中设计了一个组稀疏项来过滤移动物体中的雪/雨像素。实验结果表明，我们提出的模型比目前最先进的除雪和除雨方法效果更好。

{"title":"Video Desnowing and Deraining Based on Matrix Decomposition","authors":"Weihong Ren, Jiandong Tian, Zhi Han, Antoni B. Chan, Yandong Tang","doi":"10.1109/CVPR.2017.303","DOIUrl":"https://doi.org/10.1109/CVPR.2017.303","url":null,"abstract":"The existing snow/rain removal methods often fail for heavy snow/rain and dynamic scene. One reason for the failure is due to the assumption that all the snowflakes/rain streaks are sparse in snow/rain scenes. The other is that the existing methods often can not differentiate moving objects and snowflakes/rain streaks. In this paper, we propose a model based on matrix decomposition for video desnowing and deraining to solve the problems mentioned above. We divide snowflakes/rain streaks into two categories: sparse ones and dense ones. With background fluctuations and optical flow information, the detection of moving objects and sparse snowflakes/rain streaks is formulated as a multi-label Markov Random Fields (MRFs). As for dense snowflakes/rain streaks, they are considered to obey Gaussian distribution. The snowflakes/rain streaks, including sparse ones and dense ones, in scene backgrounds are removed by low-rank representation of the backgrounds. Meanwhile, a group sparsity term in our model is designed to filter snow/rain pixels within the moving objects. Experimental results show that our proposed model performs better than the state-of-the-art methods for snow and rain removal.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"19 1","pages":"2838-2847"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84954333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 129

Image Splicing Detection via Camera Response Function Analysis 基于相机响应函数分析的图像拼接检测

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.203

Can Chen, Scott McCloskey, Jingyi Yu

Recent advances on image manipulation techniques have made image forgery detection increasingly more challenging. An important component in such tools is to fake motion and/or defocus blurs through boundary splicing and copy-move operators, to emulate wide aperture and slow shutter effects. In this paper, we present a new technique based on the analysis of the camera response functions (CRF) for efficient and robust splicing and copy-move forgery detection and localization. We first analyze how non-linear CRFs affect edges in terms of the intensity-gradient bivariable histograms. We show distinguishable shape differences on real vs. forged blurs near edges after a splicing operation. Based on our analysis, we introduce a deep-learning framework to detect and localize forged edges. In particular, we show the problem can be transformed to a handwriting recognition problem an resolved by using a convolutional neural network. We generate a large dataset of forged images produced by splicing followed by retouching and comprehensive experiments show our proposed method outperforms the state-of-the-art techniques in accuracy and robustness.

近年来图像处理技术的进步使得图像伪造检测越来越具有挑战性。这些工具的一个重要组成部分是通过边界拼接和复制移动操作来伪造运动和/或散焦模糊，以模拟大光圈和慢快门效果。本文提出了一种基于摄像机响应函数(CRF)分析的新技术，用于高效鲁棒的拼接和复制-移动伪造检测与定位。我们首先根据强度梯度双变量直方图分析非线性crf如何影响边缘。在拼接操作后，我们在边缘附近的真实与伪造模糊上显示可区分的形状差异。基于我们的分析，我们引入了一个深度学习框架来检测和定位伪造边缘。特别地，我们展示了这个问题可以转化为一个手写识别问题，并通过使用卷积神经网络来解决。我们生成了一个由拼接产生的伪造图像的大型数据集，然后进行了修饰，综合实验表明，我们提出的方法在准确性和鲁棒性方面优于最先进的技术。

{"title":"Image Splicing Detection via Camera Response Function Analysis","authors":"Can Chen, Scott McCloskey, Jingyi Yu","doi":"10.1109/CVPR.2017.203","DOIUrl":"https://doi.org/10.1109/CVPR.2017.203","url":null,"abstract":"Recent advances on image manipulation techniques have made image forgery detection increasingly more challenging. An important component in such tools is to fake motion and/or defocus blurs through boundary splicing and copy-move operators, to emulate wide aperture and slow shutter effects. In this paper, we present a new technique based on the analysis of the camera response functions (CRF) for efficient and robust splicing and copy-move forgery detection and localization. We first analyze how non-linear CRFs affect edges in terms of the intensity-gradient bivariable histograms. We show distinguishable shape differences on real vs. forged blurs near edges after a splicing operation. Based on our analysis, we introduce a deep-learning framework to detect and localize forged edges. In particular, we show the problem can be transformed to a handwriting recognition problem an resolved by using a convolutional neural network. We generate a large dataset of forged images produced by splicing followed by retouching and comprehensive experiments show our proposed method outperforms the state-of-the-art techniques in accuracy and robustness.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"50 1","pages":"1876-1885"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82249215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

3D Point Cloud Registration for Localization Using a Deep Neural Network Auto-Encoder 三维点云配准定位使用深度神经网络自编码器

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.265

Gil Elbaz, Tamar Avraham, A. Fischer

We present an algorithm for registration between a large-scale point cloud and a close-proximity scanned point cloud, providing a localization solution that is fully independent of prior information about the initial positions of the two point cloud coordinate systems. The algorithm, denoted LORAX, selects super-points–local subsets of points–and describes the geometric structure of each with a low-dimensional descriptor. These descriptors are then used to infer potential matching regions for an efficient coarse registration process, followed by a fine-tuning stage. The set of super-points is selected by covering the point clouds with overlapping spheres, and then filtering out those of low-quality or nonsalient regions. The descriptors are computed using state-of-the-art unsupervised machine learning, utilizing the technology of deep neural network based auto-encoders. Abstract This novel framework provides a strong alternative to the common practice of using manually designed key-point descriptors for coarse point cloud registration. Utilizing super-points instead of key-points allows the available geometrical data to be better exploited to find the correct transformation. Encoding local 3D geometric structures using a deep neural network auto-encoder instead of traditional descriptors continues the trend seen in other computer vision applications and indeed leads to superior results. The algorithm is tested on challenging point cloud registration datasets, and its advantages over previous approaches as well as its robustness to density changes, noise, and missing data are shown.

我们提出了一种大规模点云和近距离扫描点云之间的配准算法，提供了一种完全独立于两个点云坐标系统初始位置的先验信息的定位解决方案。该算法命名为LORAX，选取超点–点–的局部子集;并用低维描述符描述每个点的几何结构。然后使用这些描述符来推断潜在的匹配区域，以进行有效的粗配准过程，然后进行微调阶段。用重叠的球体覆盖点云，然后过滤掉低质量或非显著区域，从而选择超级点集。描述符使用最先进的无监督机器学习计算，利用基于深度神经网络的自编码器技术。这个新颖的框架为使用人工设计的关键点描述符进行粗点云配准提供了一个强大的替代方案。利用超点而不是关键点可以更好地利用可用的几何数据来找到正确的变换。使用深度神经网络自编码器编码局部三维几何结构，而不是传统的描述符，延续了其他计算机视觉应用的趋势，并确实带来了更好的结果。在具有挑战性的点云配准数据集上对该算法进行了测试，结果表明该算法相对于以往方法的优势以及对密度变化、噪声和缺失数据的鲁棒性。

{"title":"3D Point Cloud Registration for Localization Using a Deep Neural Network Auto-Encoder","authors":"Gil Elbaz, Tamar Avraham, A. Fischer","doi":"10.1109/CVPR.2017.265","DOIUrl":"https://doi.org/10.1109/CVPR.2017.265","url":null,"abstract":"We present an algorithm for registration between a large-scale point cloud and a close-proximity scanned point cloud, providing a localization solution that is fully independent of prior information about the initial positions of the two point cloud coordinate systems. The algorithm, denoted LORAX, selects super-points–local subsets of points–and describes the geometric structure of each with a low-dimensional descriptor. These descriptors are then used to infer potential matching regions for an efficient coarse registration process, followed by a fine-tuning stage. The set of super-points is selected by covering the point clouds with overlapping spheres, and then filtering out those of low-quality or nonsalient regions. The descriptors are computed using state-of-the-art unsupervised machine learning, utilizing the technology of deep neural network based auto-encoders. Abstract This novel framework provides a strong alternative to the common practice of using manually designed key-point descriptors for coarse point cloud registration. Utilizing super-points instead of key-points allows the available geometrical data to be better exploited to find the correct transformation. Encoding local 3D geometric structures using a deep neural network auto-encoder instead of traditional descriptors continues the trend seen in other computer vision applications and indeed leads to superior results. The algorithm is tested on challenging point cloud registration datasets, and its advantages over previous approaches as well as its robustness to density changes, noise, and missing data are shown.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"37 1","pages":"2472-2481"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87371725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 178

A Wide-Field-of-View Monocentric Light Field Camera 宽视场单心光场相机

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.400

D. Dansereau, G. Schuster, J. Ford, Gordon Wetzstein

Light field (LF) capture and processing are important in an expanding range of computer vision applications, offering rich textural and depth information and simplification of conventionally complex tasks. Although LF cameras are commercially available, no existing device offers wide field-of-view (FOV) imaging. This is due in part to the limitations of fisheye lenses, for which a fundamentally constrained entrance pupil diameter severely limits depth sensitivity. In this work we describe a novel, compact optical design that couples a monocentric lens with multiple sensors using microlens arrays, allowing LF capture with an unprecedented FOV. Leveraging capabilities of the LF representation, we propose a novel method for efficiently coupling the spherical lens and planar sensors, replacing expensive and bulky fiber bundles. We construct a single-sensor LF camera prototype, rotating the sensor relative to a fixed main lens to emulate a wide-FOV multi-sensor scenario. Finally, we describe a processing toolchain, including a convenient spherical LF parameterization, and demonstrate depth estimation and post-capture refocus for indoor and outdoor panoramas with 15 x 15 x 1600 x 200 pixels (72 MPix) and a 138° FOV.

光场捕获和处理在越来越广泛的计算机视觉应用中非常重要，它提供了丰富的纹理和深度信息，并简化了传统的复杂任务。虽然LF相机在商业上可用，但没有现有的设备提供大视场(FOV)成像。这在一定程度上是由于鱼眼镜头的局限性，因为鱼眼镜头从根本上限制了入口瞳孔直径，严重限制了深度灵敏度。在这项工作中，我们描述了一种新颖的、紧凑的光学设计，它将单心透镜与使用微透镜阵列的多个传感器耦合在一起，从而以前所未有的视场捕获LF。利用LF表示的能力，我们提出了一种新的方法来有效地耦合球面透镜和平面传感器，取代昂贵和笨重的光纤束。我们构建了一个单传感器LF相机原型，相对于固定主镜头旋转传感器来模拟广角多传感器场景。最后，我们描述了一个处理工具链，包括一个方便的球形LF参数化，并演示了15 x 15 x 1600 x 200像素(72 MPix)和138°视场。

{"title":"A Wide-Field-of-View Monocentric Light Field Camera","authors":"D. Dansereau, G. Schuster, J. Ford, Gordon Wetzstein","doi":"10.1109/CVPR.2017.400","DOIUrl":"https://doi.org/10.1109/CVPR.2017.400","url":null,"abstract":"Light field (LF) capture and processing are important in an expanding range of computer vision applications, offering rich textural and depth information and simplification of conventionally complex tasks. Although LF cameras are commercially available, no existing device offers wide field-of-view (FOV) imaging. This is due in part to the limitations of fisheye lenses, for which a fundamentally constrained entrance pupil diameter severely limits depth sensitivity. In this work we describe a novel, compact optical design that couples a monocentric lens with multiple sensors using microlens arrays, allowing LF capture with an unprecedented FOV. Leveraging capabilities of the LF representation, we propose a novel method for efficiently coupling the spherical lens and planar sensors, replacing expensive and bulky fiber bundles. We construct a single-sensor LF camera prototype, rotating the sensor relative to a fixed main lens to emulate a wide-FOV multi-sensor scenario. Finally, we describe a processing toolchain, including a convenient spherical LF parameterization, and demonstrate depth estimation and post-capture refocus for indoor and outdoor panoramas with 15 x 15 x 1600 x 200 pixels (72 MPix) and a 138° FOV.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"69 1","pages":"3757-3766"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90351212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Specular Highlight Removal in Facial Images 镜面高光去除在面部图像

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.297

Chen Li, Stephen Lin, Kun Zhou, K. Ikeuchi

We present a method for removing specular highlight reflections in facial images that may contain varying illumination colors. This is accurately achieved through the use of physical and statistical properties of human skin and faces. We employ a melanin and hemoglobin based model to represent the diffuse color variations in facial skin, and utilize this model to constrain the highlight removal solution in a manner that is effective even for partially saturated pixels. The removal of highlights is further facilitated through estimation of directionally variant illumination colors over the face, which is done while taking advantage of a statistically-based approximation of facial geometry. An important practical feature of the proposed method is that the skin color model is utilized in a way that does not require color calibration of the camera. Moreover, this approach does not require assumptions commonly needed in previous highlight removal techniques, such as uniform illumination color or piecewise-constant surface colors. We validate this technique through comparisons to existing methods for removing specular highlights.

我们提出了一种去除面部图像中可能包含不同照明颜色的高光反射的方法。这是通过使用人体皮肤和面部的物理和统计特性准确实现的。我们采用基于黑色素和血红蛋白的模型来表示面部皮肤的漫射颜色变化，并利用该模型以一种即使对部分饱和像素也有效的方式约束高光去除解决方案。通过估计面部方向上不同的照明颜色，进一步促进了高光的去除，这是在利用基于统计的面部几何近似的同时完成的。提出的方法的一个重要的实用特征是肤色模型的利用方式不需要相机的颜色校准。此外，这种方法不需要在以前的高光去除技术中通常需要的假设，例如均匀的照明颜色或分段恒定的表面颜色。我们通过与现有的去除镜面高光的方法进行比较来验证这种技术。

引用次数: 29

Dynamic FAUST: Registering Human Bodies in Motion 动态FAUST:在运动中注册人体

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.591

Federica Bogo, J. Romero, Gerard Pons-Moll, Michael J. Black

While the ready availability of 3D scan data has influenced research throughout computer vision, less attention has focused on 4D data, that is 3D scans of moving non-rigid objects, captured over time. To be useful for vision research, such 4D scans need to be registered, or aligned, to a common topology. Consequently, extending mesh registration methods to 4D is important. Unfortunately, no ground-truth datasets are available for quantitative evaluation and comparison of 4D registration methods. To address this we create a novel dataset of high-resolution 4D scans of human subjects in motion, captured at 60 fps. We propose a new mesh registration method that uses both 3D geometry and texture information to register all scans in a sequence to a common reference topology. The approach exploits consistency in texture over both short and long time intervals and deals with temporal offsets between shape and texture capture. We show how using geometry alone results in significant errors in alignment when the motions are fast and non-rigid. We evaluate the accuracy of our registration and provide a dataset of 40,000 raw and aligned meshes. Dynamic FAUST extends the popular FAUST dataset to dynamic 4D data, and is available for research purposes at http://dfaust.is.tue.mpg.de.

虽然3D扫描数据的可用性已经影响了整个计算机视觉的研究，但人们对4D数据的关注较少，4D数据是随着时间推移捕获的移动非刚性物体的3D扫描。为了对视觉研究有用，这样的4D扫描需要注册或对齐到一个共同的拓扑结构。因此，将网格配准方法扩展到4D是很重要的。遗憾的是，目前还没有可用于定量评价和比较4D配准方法的真实数据集。为了解决这个问题，我们创建了一个新的高分辨率4D扫描数据集，以60帧/秒的速度捕获运动中的人类受试者。我们提出了一种新的网格配准方法，该方法使用三维几何和纹理信息将序列中的所有扫描注册到一个共同的参考拓扑。该方法利用纹理在短时间和长时间间隔上的一致性，并处理形状和纹理捕获之间的时间偏移。我们展示了当运动快速和非刚性时，如何单独使用几何导致对齐中的显着误差。我们评估了注册的准确性，并提供了40,000个原始和对齐网格的数据集。动态FAUST将流行的FAUST数据集扩展为动态4D数据，可用于http://dfaust.is.tue.mpg.de的研究目的。

{"title":"Dynamic FAUST: Registering Human Bodies in Motion","authors":"Federica Bogo, J. Romero, Gerard Pons-Moll, Michael J. Black","doi":"10.1109/CVPR.2017.591","DOIUrl":"https://doi.org/10.1109/CVPR.2017.591","url":null,"abstract":"While the ready availability of 3D scan data has influenced research throughout computer vision, less attention has focused on 4D data, that is 3D scans of moving non-rigid objects, captured over time. To be useful for vision research, such 4D scans need to be registered, or aligned, to a common topology. Consequently, extending mesh registration methods to 4D is important. Unfortunately, no ground-truth datasets are available for quantitative evaluation and comparison of 4D registration methods. To address this we create a novel dataset of high-resolution 4D scans of human subjects in motion, captured at 60 fps. We propose a new mesh registration method that uses both 3D geometry and texture information to register all scans in a sequence to a common reference topology. The approach exploits consistency in texture over both short and long time intervals and deals with temporal offsets between shape and texture capture. We show how using geometry alone results in significant errors in alignment when the motions are fast and non-rigid. We evaluate the accuracy of our registration and provide a dataset of 40,000 raw and aligned meshes. Dynamic FAUST extends the popular FAUST dataset to dynamic 4D data, and is available for research purposes at http://dfaust.is.tue.mpg.de.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"12 1","pages":"5573-5582"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77078810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 304

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀