首页 > 最新文献

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Single Image Reflection Suppression 单像反射抑制
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.190
Nikolaos Arvanitopoulos, R. Achanta, S. Süsstrunk
Reflections are a common artifact in images taken through glass windows. Automatically removing the reflection artifacts after the picture is taken is an ill-posed problem. Attempts to solve this problem using optimization schemes therefore rely on various prior assumptions from the physical world. Instead of removing reflections from a single image, which has met with limited success so far, we propose a novel approach to suppress reflections. It is based on a Laplacian data fidelity term and an l-zero gradient sparsity term imposed on the output. With experiments on artificial and real-world images we show that our reflection suppression method performs better than the state-of-the-art reflection removal techniques.
在透过玻璃窗拍摄的图像中,反射是一种常见的人工产物。在拍照后自动去除反射伪影是一个不适定问题。因此,使用优化方案解决这个问题的尝试依赖于来自物理世界的各种先验假设。我们提出了一种新的方法来抑制反射,而不是从单个图像中去除反射,这到目前为止已经取得了有限的成功。它基于拉普拉斯数据保真度项和施加在输出上的l- 0梯度稀疏性项。通过对人工图像和真实图像的实验,我们表明我们的反射抑制方法比最先进的反射去除技术表现得更好。
{"title":"Single Image Reflection Suppression","authors":"Nikolaos Arvanitopoulos, R. Achanta, S. Süsstrunk","doi":"10.1109/CVPR.2017.190","DOIUrl":"https://doi.org/10.1109/CVPR.2017.190","url":null,"abstract":"Reflections are a common artifact in images taken through glass windows. Automatically removing the reflection artifacts after the picture is taken is an ill-posed problem. Attempts to solve this problem using optimization schemes therefore rely on various prior assumptions from the physical world. Instead of removing reflections from a single image, which has met with limited success so far, we propose a novel approach to suppress reflections. It is based on a Laplacian data fidelity term and an l-zero gradient sparsity term imposed on the output. With experiments on artificial and real-world images we show that our reflection suppression method performs better than the state-of-the-art reflection removal techniques.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"43 1","pages":"1752-1760"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75946834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 106
FC^4: Fully Convolutional Color Constancy with Confidence-Weighted Pooling FC^4:基于置信度加权池的全卷积颜色一致性
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.43
Yuanming Hu, Baoyuan Wang, Stephen Lin
Improvements in color constancy have arisen from the use of convolutional neural networks (CNNs). However, the patch-based CNNs that exist for this problem are faced with the issue of estimation ambiguity, where a patch may contain insufficient information to establish a unique or even a limited possible range of illumination colors. Image patches with estimation ambiguity not only appear with great frequency in photographs, but also significantly degrade the quality of network training and inference. To overcome this problem, we present a fully convolutional network architecture in which patches throughout an image can carry different confidence weights according to the value they provide for color constancy estimation. These confidence weights are learned and applied within a novel pooling layer where the local estimates are merged into a global solution. With this formulation, the network is able to determine what to learn and how to pool automatically from color constancy datasets without additional supervision. The proposed network also allows for end-to-end training, and achieves higher efficiency and accuracy. On standard benchmarks, our network outperforms the previous state-of-the-art while achieving 120x greater efficiency.
使用卷积神经网络(cnn)可以改善颜色的稳定性。然而,针对该问题存在的基于patch的cnn面临着估计模糊的问题,其中patch可能包含的信息不足,无法建立唯一甚至有限的可能照明颜色范围。带有估计模糊的图像补丁不仅在照片中出现的频率很高,而且严重降低了网络训练和推理的质量。为了克服这个问题,我们提出了一个全卷积网络架构,其中图像中的补丁可以根据它们提供的颜色常量估计值携带不同的置信度权重。这些置信度权重被学习并应用在一个新的池化层中,在这个池化层中,局部估计被合并到一个全局解决方案中。有了这个公式,网络就能够确定要学习什么,以及如何在没有额外监督的情况下从颜色恒定数据集中自动池化。该网络还允许端到端训练,实现了更高的效率和准确性。在标准基准测试中,我们的网络性能优于以前的最先进的技术,同时实现了120倍的效率。
{"title":"FC^4: Fully Convolutional Color Constancy with Confidence-Weighted Pooling","authors":"Yuanming Hu, Baoyuan Wang, Stephen Lin","doi":"10.1109/CVPR.2017.43","DOIUrl":"https://doi.org/10.1109/CVPR.2017.43","url":null,"abstract":"Improvements in color constancy have arisen from the use of convolutional neural networks (CNNs). However, the patch-based CNNs that exist for this problem are faced with the issue of estimation ambiguity, where a patch may contain insufficient information to establish a unique or even a limited possible range of illumination colors. Image patches with estimation ambiguity not only appear with great frequency in photographs, but also significantly degrade the quality of network training and inference. To overcome this problem, we present a fully convolutional network architecture in which patches throughout an image can carry different confidence weights according to the value they provide for color constancy estimation. These confidence weights are learned and applied within a novel pooling layer where the local estimates are merged into a global solution. With this formulation, the network is able to determine what to learn and how to pool automatically from color constancy datasets without additional supervision. The proposed network also allows for end-to-end training, and achieves higher efficiency and accuracy. On standard benchmarks, our network outperforms the previous state-of-the-art while achieving 120x greater efficiency.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"37 1","pages":"330-339"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73774822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 184
Deep Video Deblurring for Hand-Held Cameras 手持相机的深度视频去模糊
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.33
Shuochen Su, M. Delbracio, Jue Wang, G. Sapiro, W. Heidrich, Oliver Wang
Motion blur from camera shake is a major problem in videos captured by hand-held devices. Unlike single-image deblurring, video-based approaches can take advantage of the abundant information that exists across neighboring frames. As a result the best performing methods rely on the alignment of nearby frames. However, aligning images is a computationally expensive and fragile procedure, and methods that aggregate information must therefore be able to identify which regions have been accurately aligned and which have not, a task that requires high level scene understanding. In this work, we introduce a deep learning solution to video deblurring, where a CNN is trained end-to-end to learn how to accumulate information across frames. To train this network, we collected a dataset of real videos recorded with a high frame rate camera, which we use to generate synthetic motion blur for supervision. We show that the features learned from this dataset extend to deblurring motion blur that arises due to camera shake in a wide range of videos, and compare the quality of results to a number of other baselines.
相机抖动引起的运动模糊是手持设备拍摄视频的主要问题。与单图像去模糊不同,基于视频的方法可以利用存在于相邻帧之间的丰富信息。因此,最好的方法依赖于附近帧的对齐。然而,对齐图像是一个计算昂贵且脆弱的过程,因此聚合信息的方法必须能够识别哪些区域已经准确对齐,哪些没有,这是一项需要高水平场景理解的任务。在这项工作中,我们为视频去模糊引入了一种深度学习解决方案,其中对CNN进行端到端训练,以学习如何跨帧积累信息。为了训练这个网络,我们收集了一个用高帧率摄像机录制的真实视频数据集,我们用它来生成合成运动模糊以进行监督。我们展示了从这个数据集中学习到的特征扩展到去模糊运动模糊,这是由于在大范围的视频中相机抖动而产生的,并将结果的质量与许多其他基线进行比较。
{"title":"Deep Video Deblurring for Hand-Held Cameras","authors":"Shuochen Su, M. Delbracio, Jue Wang, G. Sapiro, W. Heidrich, Oliver Wang","doi":"10.1109/CVPR.2017.33","DOIUrl":"https://doi.org/10.1109/CVPR.2017.33","url":null,"abstract":"Motion blur from camera shake is a major problem in videos captured by hand-held devices. Unlike single-image deblurring, video-based approaches can take advantage of the abundant information that exists across neighboring frames. As a result the best performing methods rely on the alignment of nearby frames. However, aligning images is a computationally expensive and fragile procedure, and methods that aggregate information must therefore be able to identify which regions have been accurately aligned and which have not, a task that requires high level scene understanding. In this work, we introduce a deep learning solution to video deblurring, where a CNN is trained end-to-end to learn how to accumulate information across frames. To train this network, we collected a dataset of real videos recorded with a high frame rate camera, which we use to generate synthetic motion blur for supervision. We show that the features learned from this dataset extend to deblurring motion blur that arises due to camera shake in a wide range of videos, and compare the quality of results to a number of other baselines.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"15 1","pages":"237-246"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72950907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 445
What is and What is Not a Salient Object? Learning Salient Object Detector by Ensembling Linear Exemplar Regressors 什么是突出对象,什么不是突出对象?用集成线性样例回归学习显著目标检测器
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.468
Changqun Xia, Jia Li, Xiaowu Chen, Anlin Zheng, Yu Zhang
Finding what is and what is not a salient object can be helpful in developing better features and models in salient object detection (SOD). In this paper, we investigate the images that are selected and discarded in constructing a new SOD dataset and find that many similar candidates, complex shape and low objectness are three main attributes of many non-salient objects. Moreover, objects may have diversified attributes that make them salient. As a result, we propose a novel salient object detector by ensembling linear exemplar regressors. We first select reliable foreground and background seeds using the boundary prior and then adopt locally linear embedding (LLE) to conduct manifold-preserving foregroundness propagation. In this manner, a foregroundness map can be generated to roughly pop-out salient objects and suppress non-salient ones with many similar candidates. Moreover, we extract the shape, foregroundness and attention descriptors to characterize the extracted object proposals, and a linear exemplar regressor is trained to encode how to detect salient proposals in a specific image. Finally, various linear exemplar regressors are ensembled to form a single detector that adapts to various scenarios. Extensive experimental results on 5 dataset and the new SOD dataset show that our approach outperforms 9 state-of-art methods.
找出什么是突出对象,什么不是突出对象,有助于在突出对象检测(SOD)中开发更好的特征和模型。在本文中,我们研究了在构建新的SOD数据集时选择和丢弃的图像,发现许多非显著目标的三个主要属性是相似的候选图像,复杂的形状和低客观性。此外,对象可能具有使其突出的多样化属性。因此,我们提出了一种新的显著目标检测器,该检测器采用线性样例回归集合。首先利用边界先验选择可靠的前景和背景种子,然后采用局部线性嵌入(LLE)进行保流形前景传播。通过这种方式,可以生成前景图,粗略地弹出突出对象,并抑制具有许多相似候选对象的非突出对象。此外,我们提取了形状、前景和注意力描述符来表征提取的目标建议,并训练了一个线性样例回归器来编码如何在特定图像中检测突出建议。最后,将各种线性样例回归量组合成一个适应各种场景的单一检测器。在5个数据集和新的SOD数据集上的大量实验结果表明,我们的方法优于9种最先进的方法。
{"title":"What is and What is Not a Salient Object? Learning Salient Object Detector by Ensembling Linear Exemplar Regressors","authors":"Changqun Xia, Jia Li, Xiaowu Chen, Anlin Zheng, Yu Zhang","doi":"10.1109/CVPR.2017.468","DOIUrl":"https://doi.org/10.1109/CVPR.2017.468","url":null,"abstract":"Finding what is and what is not a salient object can be helpful in developing better features and models in salient object detection (SOD). In this paper, we investigate the images that are selected and discarded in constructing a new SOD dataset and find that many similar candidates, complex shape and low objectness are three main attributes of many non-salient objects. Moreover, objects may have diversified attributes that make them salient. As a result, we propose a novel salient object detector by ensembling linear exemplar regressors. We first select reliable foreground and background seeds using the boundary prior and then adopt locally linear embedding (LLE) to conduct manifold-preserving foregroundness propagation. In this manner, a foregroundness map can be generated to roughly pop-out salient objects and suppress non-salient ones with many similar candidates. Moreover, we extract the shape, foregroundness and attention descriptors to characterize the extracted object proposals, and a linear exemplar regressor is trained to encode how to detect salient proposals in a specific image. Finally, various linear exemplar regressors are ensembled to form a single detector that adapts to various scenarios. Extensive experimental results on 5 dataset and the new SOD dataset show that our approach outperforms 9 state-of-art methods.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"36 1","pages":"4399-4407"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91468917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
4D Light Field Superpixel and Segmentation 4D光场超像素和分割
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.710
Hao Zhu, Qi Zhang, Qing Wang
Superpixel segmentation of 2D image has been widely used in many computer vision tasks. However, limited to the Gaussian imaging principle, there is not a thorough segmentation solution to the ambiguity in defocus and occlusion boundary areas. In this paper, we consider the essential element of image pixel, i.e., rays in the light space and propose light field superpixel (LFSP) segmentation to eliminate the ambiguity. The LFSP is first defined mathematically and then a refocus-invariant metric named LFSP self-similarity is proposed to evaluate the segmentation performance. By building a clique system containing 80 neighbors in light field, a robust refocus-invariant LFSP segmentation algorithm is developed. Experimental results on both synthetic and real light field datasets demonstrate the advantages over the state-of-the-arts in terms of traditional evaluation metrics. Additionally the LFSP self-similarity evaluation under different light field refocus levels shows the refocus-invariance of the proposed algorithm.
二维图像的超像素分割已广泛应用于许多计算机视觉任务中。然而,受高斯成像原理的限制,对于离焦和遮挡边界区域的模糊问题,没有一个彻底的分割解决方案。本文考虑图像像素的基本要素,即光空间中的光线,提出光场超像素分割(LFSP)来消除模糊性。首先对LFSP进行了数学定义,然后提出了一个重新聚焦不变的LFSP自相似度度量来评价分割性能。通过在光场中构建包含80个邻居的团块系统,提出了一种鲁棒的重聚焦不变LFSP分割算法。在合成光场和真实光场数据集上的实验结果表明,该方法在传统评价指标方面具有优势。此外,对不同光场再聚焦水平下的LFSP自相似度进行了评价,结果表明该算法具有再聚焦不变性。
{"title":"4D Light Field Superpixel and Segmentation","authors":"Hao Zhu, Qi Zhang, Qing Wang","doi":"10.1109/CVPR.2017.710","DOIUrl":"https://doi.org/10.1109/CVPR.2017.710","url":null,"abstract":"Superpixel segmentation of 2D image has been widely used in many computer vision tasks. However, limited to the Gaussian imaging principle, there is not a thorough segmentation solution to the ambiguity in defocus and occlusion boundary areas. In this paper, we consider the essential element of image pixel, i.e., rays in the light space and propose light field superpixel (LFSP) segmentation to eliminate the ambiguity. The LFSP is first defined mathematically and then a refocus-invariant metric named LFSP self-similarity is proposed to evaluate the segmentation performance. By building a clique system containing 80 neighbors in light field, a robust refocus-invariant LFSP segmentation algorithm is developed. Experimental results on both synthetic and real light field datasets demonstrate the advantages over the state-of-the-arts in terms of traditional evaluation metrics. Additionally the LFSP self-similarity evaluation under different light field refocus levels shows the refocus-invariance of the proposed algorithm.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"55 1","pages":"6709-6717"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88964710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
A Deep Regression Architecture with Two-Stage Re-initialization for High Performance Facial Landmark Detection 一种基于两阶段重新初始化的深度回归结构用于高性能人脸地标检测
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.393
Jiang-Jing Lv, Xiaohu Shao, Junliang Xing, Cheng Cheng, Xi Zhou
Regression based facial landmark detection methods usually learns a series of regression functions to update the landmark positions from an initial estimation. Most of existing approaches focus on learning effective mapping functions with robust image features to improve performance. The approach to dealing with the initialization issue, however, receives relatively fewer attentions. In this paper, we present a deep regression architecture with two-stage re-initialization to explicitly deal with the initialization problem. At the global stage, given an image with a rough face detection result, the full face region is firstly re-initialized by a supervised spatial transformer network to a canonical shape state and then trained to regress a coarse landmark estimation. At the local stage, different face parts are further separately re-initialized to their own canonical shape states, followed by another regression subnetwork to get the final estimation. Our proposed deep architecture is trained from end to end and obtains promising results using different kinds of unstable initialization. It also achieves superior performances over many competing algorithms.
基于回归的人脸标记检测方法通常学习一系列回归函数,从初始估计中更新标记位置。现有的方法大多侧重于学习具有鲁棒图像特征的有效映射函数来提高性能。然而,处理初始化问题的方法受到的关注相对较少。在本文中,我们提出了一种具有两阶段重新初始化的深度回归体系结构来显式地处理初始化问题。在全局阶段,给定具有粗糙人脸检测结果的图像,首先通过监督空间变压器网络将全人脸区域重新初始化为规范形状状态,然后进行训练以回归粗糙地标估计。在局部阶段,将不同的人脸部分分别重新初始化为各自的规范形状状态,然后再通过另一个回归子网络得到最终的估计。采用不同的不稳定初始化方法对所提出的深度体系结构进行了端到端训练,得到了令人满意的结果。它也取得了优于许多竞争算法的性能。
{"title":"A Deep Regression Architecture with Two-Stage Re-initialization for High Performance Facial Landmark Detection","authors":"Jiang-Jing Lv, Xiaohu Shao, Junliang Xing, Cheng Cheng, Xi Zhou","doi":"10.1109/CVPR.2017.393","DOIUrl":"https://doi.org/10.1109/CVPR.2017.393","url":null,"abstract":"Regression based facial landmark detection methods usually learns a series of regression functions to update the landmark positions from an initial estimation. Most of existing approaches focus on learning effective mapping functions with robust image features to improve performance. The approach to dealing with the initialization issue, however, receives relatively fewer attentions. In this paper, we present a deep regression architecture with two-stage re-initialization to explicitly deal with the initialization problem. At the global stage, given an image with a rough face detection result, the full face region is firstly re-initialized by a supervised spatial transformer network to a canonical shape state and then trained to regress a coarse landmark estimation. At the local stage, different face parts are further separately re-initialized to their own canonical shape states, followed by another regression subnetwork to get the final estimation. Our proposed deep architecture is trained from end to end and obtains promising results using different kinds of unstable initialization. It also achieves superior performances over many competing algorithms.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"3691-3700"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88965035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 224
Using Ranking-CNN for Age Estimation 使用rank - cnn进行年龄估计
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.86
Shixing Chen, Caojin Zhang, Ming Dong, Jialiang Le, M. Rao
Human age is considered an important biometric trait for human identification or search. Recent research shows that the aging features deeply learned from large-scale data lead to significant performance improvement on facial image-based age estimation. However, age-related ordinal information is totally ignored in these approaches. In this paper, we propose a novel Convolutional Neural Network (CNN)-based framework, ranking-CNN, for age estimation. Ranking-CNN contains a series of basic CNNs, each of which is trained with ordinal age labels. Then, their binary outputs are aggregated for the final age prediction. We theoretically obtain a much tighter error bound for ranking-based age estimation. Moreover, we rigorously prove that ranking-CNN is more likely to get smaller estimation errors when compared with multi-class classification approaches. Through extensive experiments, we show that statistically, ranking-CNN significantly outperforms other state-of-the-art age estimation models on benchmark datasets.
人类年龄被认为是人类身份识别或搜索的重要生物特征。近年来的研究表明,从大规模数据中深度学习的年龄特征可以显著提高基于人脸图像的年龄估计的性能。然而,这些方法完全忽略了与年龄相关的序数信息。在本文中,我们提出了一种新的基于卷积神经网络(CNN)的框架,rank -CNN,用于年龄估计。rank - cnn包含一系列基本的cnn,每个cnn都用有序的年龄标签进行训练。然后,汇总它们的二进制输出,用于最终的年龄预测。理论上,我们得到了基于排名的年龄估计的更小的误差范围。此外,我们严格证明了与多类分类方法相比,rank - cnn更有可能获得更小的估计误差。通过广泛的实验,我们在统计上表明,在基准数据集上,rank - cnn显著优于其他最先进的年龄估计模型。
{"title":"Using Ranking-CNN for Age Estimation","authors":"Shixing Chen, Caojin Zhang, Ming Dong, Jialiang Le, M. Rao","doi":"10.1109/CVPR.2017.86","DOIUrl":"https://doi.org/10.1109/CVPR.2017.86","url":null,"abstract":"Human age is considered an important biometric trait for human identification or search. Recent research shows that the aging features deeply learned from large-scale data lead to significant performance improvement on facial image-based age estimation. However, age-related ordinal information is totally ignored in these approaches. In this paper, we propose a novel Convolutional Neural Network (CNN)-based framework, ranking-CNN, for age estimation. Ranking-CNN contains a series of basic CNNs, each of which is trained with ordinal age labels. Then, their binary outputs are aggregated for the final age prediction. We theoretically obtain a much tighter error bound for ranking-based age estimation. Moreover, we rigorously prove that ranking-CNN is more likely to get smaller estimation errors when compared with multi-class classification approaches. Through extensive experiments, we show that statistically, ranking-CNN significantly outperforms other state-of-the-art age estimation models on benchmark datasets.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"9 1","pages":"742-751"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89380119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 228
Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network 动态面部分析:从贝叶斯滤波到递归神经网络
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.167
Jinwei Gu, Xiaodong Yang, Shalini De Mello, J. Kautz
Facial analysis in videos, including head pose estimation and facial landmark localization, is key for many applications such as facial animation capture, human activity recognition, and human-computer interaction. In this paper, we propose to use a recurrent neural network (RNN) for joint estimation and tracking of facial features in videos. We are inspired by the fact that the computation performed in an RNN bears resemblance to Bayesian filters, which have been used for tracking in many previous methods for facial analysis from videos. Bayesian filters used in these methods, however, require complicated, problem-specific design and tuning. In contrast, our proposed RNN-based method avoids such tracker-engineering by learning from training data, similar to how a convolutional neural network (CNN) avoids feature-engineering for image classification. As an end-to-end network, the proposed RNN-based method provides a generic and holistic solution for joint estimation and tracking of various types of facial features from consecutive video frames. Extensive experimental results on head pose estimation and facial landmark localization from videos demonstrate that the proposed RNN-based method outperforms frame-wise models and Bayesian filtering. In addition, we create a large-scale synthetic dataset for head pose estimation, with which we achieve state-of-the-art performance on a benchmark dataset.
视频中的面部分析,包括头部姿态估计和面部地标定位,是面部动画捕捉、人类活动识别和人机交互等许多应用的关键。在本文中,我们提出使用递归神经网络(RNN)对视频中的面部特征进行联合估计和跟踪。我们受到RNN中执行的计算与贝叶斯滤波器相似的事实的启发,贝叶斯滤波器已被用于跟踪许多先前的面部分析视频的方法。然而,这些方法中使用的贝叶斯过滤器需要复杂的、特定于问题的设计和调优。相比之下,我们提出的基于rnn的方法通过从训练数据中学习来避免这种跟踪器工程,类似于卷积神经网络(CNN)如何避免图像分类的特征工程。作为端到端网络,本文提出的基于rnn的方法为连续视频帧中各种类型面部特征的联合估计和跟踪提供了通用的整体解决方案。基于视频的头部姿态估计和面部地标定位的大量实验结果表明,基于rnn的方法优于基于帧的模型和贝叶斯滤波。此外,我们创建了一个用于头部姿态估计的大规模合成数据集,通过该数据集,我们在基准数据集上实现了最先进的性能。
{"title":"Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network","authors":"Jinwei Gu, Xiaodong Yang, Shalini De Mello, J. Kautz","doi":"10.1109/CVPR.2017.167","DOIUrl":"https://doi.org/10.1109/CVPR.2017.167","url":null,"abstract":"Facial analysis in videos, including head pose estimation and facial landmark localization, is key for many applications such as facial animation capture, human activity recognition, and human-computer interaction. In this paper, we propose to use a recurrent neural network (RNN) for joint estimation and tracking of facial features in videos. We are inspired by the fact that the computation performed in an RNN bears resemblance to Bayesian filters, which have been used for tracking in many previous methods for facial analysis from videos. Bayesian filters used in these methods, however, require complicated, problem-specific design and tuning. In contrast, our proposed RNN-based method avoids such tracker-engineering by learning from training data, similar to how a convolutional neural network (CNN) avoids feature-engineering for image classification. As an end-to-end network, the proposed RNN-based method provides a generic and holistic solution for joint estimation and tracking of various types of facial features from consecutive video frames. Extensive experimental results on head pose estimation and facial landmark localization from videos demonstrate that the proposed RNN-based method outperforms frame-wise models and Bayesian filtering. In addition, we create a large-scale synthetic dataset for head pose estimation, with which we achieve state-of-the-art performance on a benchmark dataset.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"35 1","pages":"1531-1540"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82885917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 111
Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes from 2D Ones in RGB-Depth Images 三维物体的模态检测:从rgb深度图像的二维边界框推断三维边界框
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.50
Zhuo Deng, Longin Jan Latecki
This paper addresses the problem of amodal perception of 3D object detection. The task is to not only find object localizations in the 3D world, but also estimate their physical sizes and poses, even if only parts of them are visible in the RGB-D image. Recent approaches have attempted to harness point cloud from depth channel to exploit 3D features directly in the 3D space and demonstrated the superiority over traditional 2.5D representation approaches. We revisit the amodal 3D detection problem by sticking to the 2.5D representation framework, and directly relate 2.5D visual appearance to 3D objects. We propose a novel 3D object detection system that simultaneously predicts objects 3D locations, physical sizes, and orientations in indoor scenes. Experiments on the NYUV2 dataset show our algorithm significantly outperforms the state-of-the-art and indicates 2.5D representation is capable of encoding features for 3D amodal object detection. All source code and data is on https://github.com/phoenixnn/Amodal3Det.
本文研究了三维目标检测中的模态感知问题。这项任务不仅是在3D世界中找到物体的定位,而且还要估计它们的物理大小和姿势,即使它们只有一部分在RGB-D图像中可见。最近的方法试图利用深度通道中的点云直接在3D空间中利用3D特征,并证明了其优于传统的2.5D表示方法。我们通过坚持2.5D表示框架重新审视模态3D检测问题,并直接将2.5D视觉外观与3D对象联系起来。我们提出了一种新的3D物体检测系统,可以同时预测室内场景中物体的3D位置、物理大小和方向。在NYUV2数据集上的实验表明,我们的算法明显优于最先进的算法,并表明2.5D表示能够编码3D模态对象检测的特征。所有源代码和数据在https://github.com/phoenixnn/Amodal3Det。
{"title":"Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes from 2D Ones in RGB-Depth Images","authors":"Zhuo Deng, Longin Jan Latecki","doi":"10.1109/CVPR.2017.50","DOIUrl":"https://doi.org/10.1109/CVPR.2017.50","url":null,"abstract":"This paper addresses the problem of amodal perception of 3D object detection. The task is to not only find object localizations in the 3D world, but also estimate their physical sizes and poses, even if only parts of them are visible in the RGB-D image. Recent approaches have attempted to harness point cloud from depth channel to exploit 3D features directly in the 3D space and demonstrated the superiority over traditional 2.5D representation approaches. We revisit the amodal 3D detection problem by sticking to the 2.5D representation framework, and directly relate 2.5D visual appearance to 3D objects. We propose a novel 3D object detection system that simultaneously predicts objects 3D locations, physical sizes, and orientations in indoor scenes. Experiments on the NYUV2 dataset show our algorithm significantly outperforms the state-of-the-art and indicates 2.5D representation is capable of encoding features for 3D amodal object detection. All source code and data is on https://github.com/phoenixnn/Amodal3Det.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"66 1","pages":"398-406"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79530167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 99
Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization? 大规模3D模型对于精确的视觉定位真的是必要的吗?
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.654
Torsten Sattler, A. Torii, Josef Sivic, M. Pollefeys, Hajime Taira, M. Okutomi, T. Pajdla
Accurate visual localization is a key technology for autonomous navigation. 3D structure-based methods employ 3D models of the scene to estimate the full 6DOF pose of a camera very accurately. However, constructing (and extending) large-scale 3D models is still a significant challenge. In contrast, 2D image retrieval-based methods only require a database of geo-tagged images, which is trivial to construct and to maintain. They are often considered inaccurate since they only approximate the positions of the cameras. Yet, the exact camera pose can theoretically be recovered when enough relevant database images are retrieved. In this paper, we demonstrate experimentally that large-scale 3D models are not strictly necessary for accurate visual localization. We create reference poses for a large and challenging urban dataset. Using these poses, we show that combining image-based methods with local reconstructions results in a pose accuracy similar to the state-of-the-art structure-based methods. Our results suggest that we might want to reconsider the current approach for accurate large-scale localization.
准确的视觉定位是自主导航的关键技术。基于3D结构的方法采用场景的3D模型来非常准确地估计相机的完整6DOF姿态。然而,构建(和扩展)大规模3D模型仍然是一个重大挑战。相比之下,基于二维图像检索的方法只需要一个地理标记图像的数据库,该数据库的构建和维护都很简单。它们通常被认为是不准确的,因为它们只是近似相机的位置。然而,理论上,当检索到足够的相关数据库图像时,可以恢复准确的相机姿势。在本文中,我们通过实验证明了大规模的三维模型对于精确的视觉定位并不是严格必需的。我们为大型且具有挑战性的城市数据集创建参考姿势。使用这些姿态,我们表明,将基于图像的方法与局部重建相结合,可以获得与最先进的基于结构的方法相似的姿态精度。我们的结果表明,我们可能需要重新考虑当前精确的大规模定位方法。
{"title":"Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization?","authors":"Torsten Sattler, A. Torii, Josef Sivic, M. Pollefeys, Hajime Taira, M. Okutomi, T. Pajdla","doi":"10.1109/CVPR.2017.654","DOIUrl":"https://doi.org/10.1109/CVPR.2017.654","url":null,"abstract":"Accurate visual localization is a key technology for autonomous navigation. 3D structure-based methods employ 3D models of the scene to estimate the full 6DOF pose of a camera very accurately. However, constructing (and extending) large-scale 3D models is still a significant challenge. In contrast, 2D image retrieval-based methods only require a database of geo-tagged images, which is trivial to construct and to maintain. They are often considered inaccurate since they only approximate the positions of the cameras. Yet, the exact camera pose can theoretically be recovered when enough relevant database images are retrieved. In this paper, we demonstrate experimentally that large-scale 3D models are not strictly necessary for accurate visual localization. We create reference poses for a large and challenging urban dataset. Using these poses, we show that combining image-based methods with local reconstructions results in a pose accuracy similar to the state-of-the-art structure-based methods. Our results suggest that we might want to reconsider the current approach for accurate large-scale localization.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"299 1","pages":"6175-6184"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74970466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
期刊
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1