首页 > 最新文献

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition最新文献

英文 中文
Modeling Facial Geometry Using Compositional VAEs 使用合成VAEs建模面部几何
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00408
Timur M. Bagautdinov, Chenglei Wu, Jason M. Saragih, P. Fua, Yaser Sheikh
We propose a method for learning non-linear face geometry representations using deep generative models. Our model is a variational autoencoder with multiple levels of hidden variables where lower layers capture global geometry and higher ones encode more local deformations. Based on that, we propose a new parameterization of facial geometry that naturally decomposes the structure of the human face into a set of semantically meaningful levels of detail. This parameterization enables us to do model fitting while capturing varying level of detail under different types of geometrical constraints.
我们提出了一种使用深度生成模型学习非线性面部几何表示的方法。我们的模型是一个具有多层隐藏变量的变分自编码器,其中较低的层捕获全局几何形状,较高的层编码更多的局部变形。在此基础上,我们提出了一种新的面部几何参数化方法,该方法将人脸结构自然地分解为一组语义上有意义的细节层次。这种参数化使我们能够进行模型拟合,同时在不同类型的几何约束下捕获不同级别的细节。
{"title":"Modeling Facial Geometry Using Compositional VAEs","authors":"Timur M. Bagautdinov, Chenglei Wu, Jason M. Saragih, P. Fua, Yaser Sheikh","doi":"10.1109/CVPR.2018.00408","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00408","url":null,"abstract":"We propose a method for learning non-linear face geometry representations using deep generative models. Our model is a variational autoencoder with multiple levels of hidden variables where lower layers capture global geometry and higher ones encode more local deformations. Based on that, we propose a new parameterization of facial geometry that naturally decomposes the structure of the human face into a set of semantically meaningful levels of detail. This parameterization enables us to do model fitting while capturing varying level of detail under different types of geometrical constraints.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"34 1","pages":"3877-3886"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87295144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation 无显式运动补偿的动态上采样滤波器深度视频超分辨率网络
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00340
Younghyun Jo, Seoung Wug Oh, Jaeyeon Kang, Seon Joo Kim
Video super-resolution (VSR) has become even more important recently to provide high resolution (HR) contents for ultra high definition displays. While many deep learning based VSR methods have been proposed, most of them rely heavily on the accuracy of motion estimation and compensation. We introduce a fundamentally different framework for VSR in this paper. We propose a novel end-to-end deep neural network that generates dynamic upsampling filters and a residual image, which are computed depending on the local spatio-temporal neighborhood of each pixel to avoid explicit motion compensation. With our approach, an HR image is reconstructed directly from the input image using the dynamic upsampling filters, and the fine details are added through the computed residual. Our network with the help of a new data augmentation technique can generate much sharper HR videos with temporal consistency, compared with the previous methods. We also provide analysis of our network through extensive experiments to show how the network deals with motions implicitly.
最近,视频超分辨率(VSR)在为超高清显示器提供高分辨率(HR)内容方面变得更加重要。虽然已经提出了许多基于深度学习的VSR方法,但大多数方法都严重依赖于运动估计和补偿的准确性。我们在本文中介绍了一个完全不同的VSR框架。我们提出了一种新的端到端深度神经网络,该网络生成动态上采样滤波器和残差图像,残差图像根据每个像素的局部时空邻域计算,以避免显式的运动补偿。利用我们的方法,使用动态上采样滤波器直接从输入图像重建HR图像,并通过计算残差添加精细细节。与以前的方法相比,我们的网络在新的数据增强技术的帮助下可以生成更清晰的HR视频,并且具有时间一致性。我们还通过大量的实验对我们的网络进行了分析,以展示网络如何隐式地处理运动。
{"title":"Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation","authors":"Younghyun Jo, Seoung Wug Oh, Jaeyeon Kang, Seon Joo Kim","doi":"10.1109/CVPR.2018.00340","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00340","url":null,"abstract":"Video super-resolution (VSR) has become even more important recently to provide high resolution (HR) contents for ultra high definition displays. While many deep learning based VSR methods have been proposed, most of them rely heavily on the accuracy of motion estimation and compensation. We introduce a fundamentally different framework for VSR in this paper. We propose a novel end-to-end deep neural network that generates dynamic upsampling filters and a residual image, which are computed depending on the local spatio-temporal neighborhood of each pixel to avoid explicit motion compensation. With our approach, an HR image is reconstructed directly from the input image using the dynamic upsampling filters, and the fine details are added through the computed residual. Our network with the help of a new data augmentation technique can generate much sharper HR videos with temporal consistency, compared with the previous methods. We also provide analysis of our network through extensive experiments to show how the network deals with motions implicitly.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"19 1","pages":"3224-3232"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90162537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 439
Occlusion-Aware Rolling Shutter Rectification of 3D Scenes 三维场景的闭塞感知卷帘门校正
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00073
Subeesh Vasu, R. MaheshMohanM., A. Rajagopalan
A vast majority of contemporary cameras employ rolling shutter (RS) mechanism to capture images. Due to the sequential mechanism, images acquired with a moving camera are subjected to rolling shutter effect which manifests as geometric distortions. In this work, we consider the specific scenario of a fast moving camera wherein the rolling shutter distortions not only are predominant but also become depth-dependent which in turn results in intra-frame occlusions. To this end, we develop a first-of-its-kind pipeline to recover the latent image of a 3D scene from a set of such RS distorted images. The proposed approach sequentially recovers both the camera motion and scene structure while accounting for RS and occlusion effects. Subsequently, we perform depth and occlusion-aware rectification of RS images to yield the desired latent image. Our experiments on synthetic and real image sequences reveal that the proposed approach achieves state-of-the-art results.
绝大多数当代相机采用滚动快门(RS)机制来捕捉图像。由于时序机制,运动相机所获得的图像受到滚动快门效应的影响,表现为几何畸变。在这项工作中,我们考虑了快速移动相机的特定场景,其中滚动快门失真不仅占主导地位,而且还变得依赖于深度,从而导致帧内遮挡。为此,我们开发了一种首创的管道来从一组这样的RS扭曲图像中恢复3D场景的潜在图像。该方法在考虑RS和遮挡效应的情况下,依次恢复摄像机运动和场景结构。随后,我们对RS图像进行深度和闭塞感知校正,以产生所需的潜在图像。我们在合成和真实图像序列上的实验表明,所提出的方法达到了最先进的结果。
{"title":"Occlusion-Aware Rolling Shutter Rectification of 3D Scenes","authors":"Subeesh Vasu, R. MaheshMohanM., A. Rajagopalan","doi":"10.1109/CVPR.2018.00073","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00073","url":null,"abstract":"A vast majority of contemporary cameras employ rolling shutter (RS) mechanism to capture images. Due to the sequential mechanism, images acquired with a moving camera are subjected to rolling shutter effect which manifests as geometric distortions. In this work, we consider the specific scenario of a fast moving camera wherein the rolling shutter distortions not only are predominant but also become depth-dependent which in turn results in intra-frame occlusions. To this end, we develop a first-of-its-kind pipeline to recover the latent image of a 3D scene from a set of such RS distorted images. The proposed approach sequentially recovers both the camera motion and scene structure while accounting for RS and occlusion effects. Subsequently, we perform depth and occlusion-aware rectification of RS images to yield the desired latent image. Our experiments on synthetic and real image sequences reveal that the proposed approach achieves state-of-the-art results.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"67 1","pages":"636-645"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90374398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
On the Importance of Label Quality for Semantic Segmentation 论标签质量对语义分割的重要性
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00160
A. Zlateski, Ronnachai Jaroensri, Prafull Sharma, F. Durand
Convolutional networks (ConvNets) have become the dominant approach to semantic image segmentation. Producing accurate, pixel-level labels required for this task is a tedious and time consuming process; however, producing approximate, coarse labels could take only a fraction of the time and effort. We investigate the relationship between the quality of labels and the performance of ConvNets for semantic segmentation. We create a very large synthetic dataset with perfectly labeled street view scenes. From these perfect labels, we synthetically coarsen labels with different qualities and estimate human-hours required for producing them. We perform a series of experiments by training ConvNets with a varying number of training images and label quality. We found that the performance of ConvNets mostly depends on the time spent creating the training labels. That is, a larger coarsely-annotated dataset can yield the same performance as a smaller finely-annotated one. Furthermore, fine-tuning coarsely pre-trained ConvNets with few finely-annotated labels can yield comparable or superior performance to training it with a large amount of finely-annotated labels alone, at a fraction of the labeling cost. We demonstrate that our result is also valid for different network architectures, and various object classes in an urban scene.
卷积网络(ConvNets)已经成为语义图像分割的主流方法。生成准确的,像素级标签所需的这项任务是一个繁琐和耗时的过程;然而,产生近似的、粗糙的标签可能只需要一小部分时间和精力。我们研究了标签质量和卷积神经网络语义分割性能之间的关系。我们创建了一个非常大的合成数据集,上面有完美标记的街景场景。从这些完美的标签中,我们综合粗化了不同品质的标签,并估计了生产这些标签所需的工时。我们通过训练具有不同数量的训练图像和标签质量的卷积神经网络来执行一系列实验。我们发现卷积神经网络的性能主要取决于创建训练标签所花费的时间。也就是说,较大的粗标注数据集可以产生与较小的细标注数据集相同的性能。此外,微调带有少量精细标注标签的粗预训练卷积神经网络可以产生与仅使用大量精细标注标签训练的卷积神经网络相当或更好的性能,而标记成本只是前者的一小部分。我们证明了我们的结果也适用于城市场景中不同的网络架构和各种对象类。
{"title":"On the Importance of Label Quality for Semantic Segmentation","authors":"A. Zlateski, Ronnachai Jaroensri, Prafull Sharma, F. Durand","doi":"10.1109/CVPR.2018.00160","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00160","url":null,"abstract":"Convolutional networks (ConvNets) have become the dominant approach to semantic image segmentation. Producing accurate, pixel-level labels required for this task is a tedious and time consuming process; however, producing approximate, coarse labels could take only a fraction of the time and effort. We investigate the relationship between the quality of labels and the performance of ConvNets for semantic segmentation. We create a very large synthetic dataset with perfectly labeled street view scenes. From these perfect labels, we synthetically coarsen labels with different qualities and estimate human-hours required for producing them. We perform a series of experiments by training ConvNets with a varying number of training images and label quality. We found that the performance of ConvNets mostly depends on the time spent creating the training labels. That is, a larger coarsely-annotated dataset can yield the same performance as a smaller finely-annotated one. Furthermore, fine-tuning coarsely pre-trained ConvNets with few finely-annotated labels can yield comparable or superior performance to training it with a large amount of finely-annotated labels alone, at a fraction of the labeling cost. We demonstrate that our result is also valid for different network architectures, and various object classes in an urban scene.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"6 1","pages":"1479-1487"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73127801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Egocentric Activity Recognition on a Budget 预算中的自我中心活动识别
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00625
Rafael Possas, Sheila M. Pinto-Caceres, F. Ramos
Recent advances in embedded technology have enabled more pervasive machine learning. One of the common applications in this field is Egocentric Activity Recognition (EAR), where users wearing a device such as a smartphone or smartglasses are able to receive feedback from the embedded device. Recent research on activity recognition has mainly focused on improving accuracy by using resource intensive techniques such as multi-stream deep networks. Although this approach has provided state-of-the-art results, in most cases it neglects the natural resource constraints (e.g. battery) of wearable devices. We develop a Reinforcement Learning model-free method to learn energy-aware policies that maximize the use of low-energy cost predictors while keeping competitive accuracy levels. Our results show that a policy trained on an egocentric dataset is able use the synergy between motion and vision sensors to effectively tradeoff energy expenditure and accuracy on smartglasses operating in realistic, real-world conditions.
嵌入式技术的最新进展使机器学习更加普及。该领域的一个常见应用是自我中心活动识别(EAR),用户戴着智能手机或智能眼镜等设备,能够接收来自嵌入式设备的反馈。近年来对活动识别的研究主要集中在利用多流深度网络等资源密集型技术来提高识别精度。尽管这种方法提供了最先进的结果,但在大多数情况下,它忽略了可穿戴设备的自然资源限制(例如电池)。我们开发了一种无模型的强化学习方法来学习能源感知策略,最大限度地利用低能源成本预测器,同时保持具有竞争力的准确性水平。我们的研究结果表明,在以自我为中心的数据集上训练的策略能够利用运动和视觉传感器之间的协同作用,有效地权衡智能眼镜在现实世界条件下的能量消耗和准确性。
{"title":"Egocentric Activity Recognition on a Budget","authors":"Rafael Possas, Sheila M. Pinto-Caceres, F. Ramos","doi":"10.1109/CVPR.2018.00625","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00625","url":null,"abstract":"Recent advances in embedded technology have enabled more pervasive machine learning. One of the common applications in this field is Egocentric Activity Recognition (EAR), where users wearing a device such as a smartphone or smartglasses are able to receive feedback from the embedded device. Recent research on activity recognition has mainly focused on improving accuracy by using resource intensive techniques such as multi-stream deep networks. Although this approach has provided state-of-the-art results, in most cases it neglects the natural resource constraints (e.g. battery) of wearable devices. We develop a Reinforcement Learning model-free method to learn energy-aware policies that maximize the use of low-energy cost predictors while keeping competitive accuracy levels. Our results show that a policy trained on an egocentric dataset is able use the synergy between motion and vision sensors to effectively tradeoff energy expenditure and accuracy on smartglasses operating in realistic, real-world conditions.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"20 1","pages":"5967-5976"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73480781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Camera Pose Estimation with Unknown Principal Point 未知主点的相机姿态估计
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00315
Viktor Larsson, Z. Kukelova, Yinqiang Zheng
To estimate the 6-DoF extrinsic pose of a pinhole camera with partially unknown intrinsic parameters is a critical sub-problem in structure-from-motion and camera localization. In most of existing camera pose estimation solvers, the principal point is assumed to be in the image center. Unfortunately, this assumption is not always true, especially for asymmetrically cropped images. In this paper, we develop the first exactly minimal solver for the case of unknown principal point and focal length by using four and a half point correspondences (P4.5Pfuv). We also present an extremely fast solver for the case of unknown aspect ratio (P5Pfuva). The new solvers outperform the previous state-of-the-art in terms of stability and speed. Finally, we explore the extremely challenging case of both unknown principal point and radial distortion, and develop the first practical non-minimal solver by using seven point correspondences (P7Pfruv). Experimental results on both simulated data and real Internet images demonstrate the usefulness of our new solvers.
在针孔相机内部参数部分未知的情况下,六自由度相机的外部位姿估计是运动构造和相机定位中的关键子问题。在现有的大多数相机姿态估计算法中,假设主点位于图像中心。不幸的是,这个假设并不总是正确的,特别是对于不对称裁剪的图像。在本文中,我们利用四点半对应(P4.5Pfuv)建立了未知主点和焦距情况下的第一个精确最小解算器。我们还提出了一个非常快速的求解未知宽高比(P5Pfuva)的方法。新的解算器在稳定性和速度方面优于以前的最先进的解算器。最后,我们探索了未知主点和径向畸变的极具挑战性的情况,并利用七点对应(P7Pfruv)开发了第一个实用的非最小解算器。在模拟数据和真实网络图像上的实验结果表明了新算法的有效性。
{"title":"Camera Pose Estimation with Unknown Principal Point","authors":"Viktor Larsson, Z. Kukelova, Yinqiang Zheng","doi":"10.1109/CVPR.2018.00315","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00315","url":null,"abstract":"To estimate the 6-DoF extrinsic pose of a pinhole camera with partially unknown intrinsic parameters is a critical sub-problem in structure-from-motion and camera localization. In most of existing camera pose estimation solvers, the principal point is assumed to be in the image center. Unfortunately, this assumption is not always true, especially for asymmetrically cropped images. In this paper, we develop the first exactly minimal solver for the case of unknown principal point and focal length by using four and a half point correspondences (P4.5Pfuv). We also present an extremely fast solver for the case of unknown aspect ratio (P5Pfuva). The new solvers outperform the previous state-of-the-art in terms of stability and speed. Finally, we explore the extremely challenging case of both unknown principal point and radial distortion, and develop the first practical non-minimal solver by using seven point correspondences (P7Pfruv). Experimental results on both simulated data and real Internet images demonstrate the usefulness of our new solvers.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"26 1","pages":"2984-2992"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74223184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Towards Pose Invariant Face Recognition in the Wild 面向姿态不变的野外人脸识别
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00235
Jian Zhao, Yu Cheng, Yan Xu, Lin Xiong, Jianshu Li, F. Zhao, J. Karlekar, Sugiri Pranata, Shengmei Shen, Junliang Xing, Shuicheng Yan, Jiashi Feng
Pose variation is one key challenge in face recognition. As opposed to current techniques for pose invariant face recognition, which either directly extract pose invariant features for recognition, or first normalize profile face images to frontal pose before feature extraction, we argue that it is more desirable to perform both tasks jointly to allow them to benefit from each other. To this end, we propose a Pose Invariant Model (PIM) for face recognition in the wild, with three distinct novelties. First, PIM is a novel and unified deep architecture, containing a Face Frontalization sub-Net (FFN) and a Discriminative Learning sub-Net (DLN), which are jointly learned from end to end. Second, FFN is a well-designed dual-path Generative Adversarial Network (GAN) which simultaneously perceives global structures and local details, incorporated with an unsupervised cross-domain adversarial training and a "learning to learn" strategy for high-fidelity and identity-preserving frontal view synthesis. Third, DLN is a generic Convolutional Neural Network (CNN) for face recognition with our enforced cross-entropy optimization strategy for learning discriminative yet generalized feature representation. Qualitative and quantitative experiments on both controlled and in-the-wild benchmarks demonstrate the superiority of the proposed model over the state-of-the-arts.
姿态变化是人脸识别中的一个关键问题。当前的姿态不变人脸识别技术要么直接提取姿态不变特征进行识别,要么在特征提取之前先将侧面人脸图像归一化为正面姿态,与此相反,我们认为将这两项任务联合执行以使它们相互受益更为可取。为此,我们提出了一种姿态不变模型(PIM)用于野外人脸识别,具有三个不同的新颖之处。首先,PIM是一种新颖的、统一的深度体系结构,它包含一个人脸前端化子网(FFN)和一个判别学习子网(DLN),它们是端到端共同学习的。其次,FFN是一个设计良好的双路径生成对抗网络(GAN),它同时感知全局结构和局部细节,结合无监督跨域对抗训练和“学习学习”策略,用于高保真和身份保持正面视图合成。第三,DLN是一种用于人脸识别的通用卷积神经网络(CNN),我们采用强制交叉熵优化策略来学习判别性和广义特征表示。在受控基准和野外基准上进行的定性和定量实验表明,所提出的模型优于最先进的模型。
{"title":"Towards Pose Invariant Face Recognition in the Wild","authors":"Jian Zhao, Yu Cheng, Yan Xu, Lin Xiong, Jianshu Li, F. Zhao, J. Karlekar, Sugiri Pranata, Shengmei Shen, Junliang Xing, Shuicheng Yan, Jiashi Feng","doi":"10.1109/CVPR.2018.00235","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00235","url":null,"abstract":"Pose variation is one key challenge in face recognition. As opposed to current techniques for pose invariant face recognition, which either directly extract pose invariant features for recognition, or first normalize profile face images to frontal pose before feature extraction, we argue that it is more desirable to perform both tasks jointly to allow them to benefit from each other. To this end, we propose a Pose Invariant Model (PIM) for face recognition in the wild, with three distinct novelties. First, PIM is a novel and unified deep architecture, containing a Face Frontalization sub-Net (FFN) and a Discriminative Learning sub-Net (DLN), which are jointly learned from end to end. Second, FFN is a well-designed dual-path Generative Adversarial Network (GAN) which simultaneously perceives global structures and local details, incorporated with an unsupervised cross-domain adversarial training and a \"learning to learn\" strategy for high-fidelity and identity-preserving frontal view synthesis. Third, DLN is a generic Convolutional Neural Network (CNN) for face recognition with our enforced cross-entropy optimization strategy for learning discriminative yet generalized feature representation. Qualitative and quantitative experiments on both controlled and in-the-wild benchmarks demonstrate the superiority of the proposed model over the state-of-the-arts.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"27 1","pages":"2207-2216"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75221520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 185
Statistical Tomography of Microscopic Life 微观生命的统计断层扫描
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00671
Aviad Levis, Y. Schechner, R. Talmon
We achieve tomography of 3D volumetric natural objects, where each projected 2D image corresponds to a different specimen. Each specimen has unknown random 3D orientation, location, and scale. This imaging scenario is relevant to microscopic and mesoscopic organisms, aerosols and hydrosols viewed naturally by a microscope. In-class scale variation inhibits prior single-particle reconstruction methods. We thus generalize tomographic recovery to account for all degrees of freedom of a similarity transformation. This enables geometric self-calibration in imaging of transparent objects. We make the computational load manageable and reach good quality reconstruction in a short time. This enables extraction of statistics that are important for a scientific study of specimen populations, specifically size distribution parameters. We apply the method to study of plankton.
我们实现了三维体积自然物体的断层扫描,其中每个投影的二维图像对应于不同的标本。每个标本都有未知的随机三维方向、位置和比例。这种成像场景与显微镜下观察到的微观和介观生物、气溶胶和水溶胶有关。类内尺度变化抑制了先前的单粒子重建方法。因此,我们推广层析恢复,以解释相似变换的所有自由度。这使几何自校准成像的透明物体。我们使计算负荷易于管理,并在短时间内达到高质量的重建。这使得统计数据的提取对标本种群的科学研究非常重要,特别是大小分布参数。我们把这种方法应用于浮游生物的研究。
{"title":"Statistical Tomography of Microscopic Life","authors":"Aviad Levis, Y. Schechner, R. Talmon","doi":"10.1109/CVPR.2018.00671","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00671","url":null,"abstract":"We achieve tomography of 3D volumetric natural objects, where each projected 2D image corresponds to a different specimen. Each specimen has unknown random 3D orientation, location, and scale. This imaging scenario is relevant to microscopic and mesoscopic organisms, aerosols and hydrosols viewed naturally by a microscope. In-class scale variation inhibits prior single-particle reconstruction methods. We thus generalize tomographic recovery to account for all degrees of freedom of a similarity transformation. This enables geometric self-calibration in imaging of transparent objects. We make the computational load manageable and reach good quality reconstruction in a short time. This enables extraction of statistics that are important for a scientific study of specimen populations, specifically size distribution parameters. We apply the method to study of plankton.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"86 1","pages":"6411-6420"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74607350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
LAMV: Learning to Align and Match Videos with Kernelized Temporal Layers LAMV:学习与核时间层对齐和匹配视频
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00814
L. Baraldi, Matthijs Douze, R. Cucchiara, H. Jégou
This paper considers a learnable approach for comparing and aligning videos. Our architecture builds upon and revisits temporal match kernels within neural networks: we propose a new temporal layer that finds temporal alignments by maximizing the scores between two sequences of vectors, according to a time-sensitive similarity metric parametrized in the Fourier domain. We learn this layer with a temporal proposal strategy, in which we minimize a triplet loss that takes into account both the localization accuracy and the recognition rate. We evaluate our approach on video alignment, copy detection and event retrieval. Our approach outperforms the state on the art on temporal video alignment and video copy detection datasets in comparable setups. It also attains the best reported results for particular event search, while precisely aligning videos.
本文提出了一种可学习的视频比较和对齐方法。我们的架构建立在并重新审视神经网络中的时间匹配核:我们提出了一个新的时间层,根据傅里叶域参数化的时间敏感相似性度量,通过最大化两个向量序列之间的分数来找到时间对齐。我们使用一种时态建议策略来学习这一层,在该策略中,我们最小化了同时考虑定位精度和识别率的三元组损失。我们评估了我们的方法在视频对齐,复制检测和事件检索。在可比较的设置中,我们的方法在时间视频对齐和视频复制检测数据集上优于当前的技术状态。它还可以在精确对齐视频的同时,为特定事件搜索获得最佳报告结果。
{"title":"LAMV: Learning to Align and Match Videos with Kernelized Temporal Layers","authors":"L. Baraldi, Matthijs Douze, R. Cucchiara, H. Jégou","doi":"10.1109/CVPR.2018.00814","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00814","url":null,"abstract":"This paper considers a learnable approach for comparing and aligning videos. Our architecture builds upon and revisits temporal match kernels within neural networks: we propose a new temporal layer that finds temporal alignments by maximizing the scores between two sequences of vectors, according to a time-sensitive similarity metric parametrized in the Fourier domain. We learn this layer with a temporal proposal strategy, in which we minimize a triplet loss that takes into account both the localization accuracy and the recognition rate. We evaluate our approach on video alignment, copy detection and event retrieval. Our approach outperforms the state on the art on temporal video alignment and video copy detection datasets in comparable setups. It also attains the best reported results for particular event search, while precisely aligning videos.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"10 1","pages":"7804-7813"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77240674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Re-weighted Adversarial Adaptation Network for Unsupervised Domain Adaptation 无监督域自适应的重加权对抗自适应网络
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00832
Qingchao Chen, Yang Liu, Zhaowen Wang, I. Wassell, K. Chetty
Unsupervised Domain Adaptation (UDA) aims to transfer domain knowledge from existing well-defined tasks to new ones where labels are unavailable. In the real-world applications, as the domain (task) discrepancies are usually uncontrollable, it is significantly motivated to match the feature distributions even if the domain discrepancies are disparate. Additionally, as no label is available in the target domain, how to successfully adapt the classifier from the source to the target domain still remains an open question. In this paper, we propose the Re-weighted Adversarial Adaptation Network (RAAN) to reduce the feature distribution divergence and adapt the classifier when domain discrepancies are disparate. Specifically, to alleviate the need of common supports in matching the feature distribution, we choose to minimize optimal transport (OT) based Earth-Mover (EM) distance and reformulate it to a minimax objective function. Utilizing this, RAAN can be trained in an end-to-end and adversarial manner. To further adapt the classifier, we propose to match the label distribution and embed it into the adversarial training. Finally, after extensive evaluation of our method using UDA datasets of varying difficulty, RAAN achieved the state-of-the-art results and outperformed other methods by a large margin when the domain shifts are disparate.
无监督域自适应(Unsupervised Domain Adaptation, UDA)旨在将已有的领域知识从定义良好的任务转移到没有标签的新任务中。在现实世界的应用程序中,由于领域(任务)差异通常是不可控的,因此即使领域差异是完全不同的,也很有必要匹配特征分布。此外,由于目标域没有可用的标签,如何成功地使分类器从源域适应目标域仍然是一个悬而未决的问题。在本文中,我们提出了重加权对抗自适应网络(RAAN)来减少特征分布差异,并在域差异完全不同的情况下自适应分类器。具体而言,为了减轻匹配特征分布时对公共支撑的需求,我们选择最小化基于最优运输(OT)的土方(EM)距离,并将其重新表述为极小极大目标函数。利用这一点,可以以端到端对抗的方式训练RAAN。为了进一步适应分类器,我们提出匹配标签分布并将其嵌入到对抗性训练中。最后,在使用不同难度的UDA数据集对我们的方法进行了广泛的评估之后,RAAN获得了最先进的结果,并且在不同的域转移时大大优于其他方法。
{"title":"Re-weighted Adversarial Adaptation Network for Unsupervised Domain Adaptation","authors":"Qingchao Chen, Yang Liu, Zhaowen Wang, I. Wassell, K. Chetty","doi":"10.1109/CVPR.2018.00832","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00832","url":null,"abstract":"Unsupervised Domain Adaptation (UDA) aims to transfer domain knowledge from existing well-defined tasks to new ones where labels are unavailable. In the real-world applications, as the domain (task) discrepancies are usually uncontrollable, it is significantly motivated to match the feature distributions even if the domain discrepancies are disparate. Additionally, as no label is available in the target domain, how to successfully adapt the classifier from the source to the target domain still remains an open question. In this paper, we propose the Re-weighted Adversarial Adaptation Network (RAAN) to reduce the feature distribution divergence and adapt the classifier when domain discrepancies are disparate. Specifically, to alleviate the need of common supports in matching the feature distribution, we choose to minimize optimal transport (OT) based Earth-Mover (EM) distance and reformulate it to a minimax objective function. Utilizing this, RAAN can be trained in an end-to-end and adversarial manner. To further adapt the classifier, we propose to match the label distribution and embed it into the adversarial training. Finally, after extensive evaluation of our method using UDA datasets of varying difficulty, RAAN achieved the state-of-the-art results and outperformed other methods by a large margin when the domain shifts are disparate.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"41 1","pages":"7976-7985"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78125182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 113
期刊
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1