首页 > 最新文献

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition最新文献

英文 中文
Neural Style Transfer via Meta Networks 通过元网络的神经风格迁移
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00841
Falong Shen, Shuicheng Yan, Gang Zeng
In this paper we propose a noval method to generate the specified network parameters through one feed-forward propagation in the meta networks for neural style transfer. Recent works on style transfer typically need to train image transformation networks for every new style, and the style is encoded in the network parameters by enormous iterations of stochastic gradient descent, which lacks the generalization ability to new style in the inference stage. To tackle these issues, we build a meta network which takes in the style image and generates a corresponding image transformation network directly. Compared with optimization-based methods for every style, our meta networks can handle an arbitrary new style within 19 milliseconds on one modern GPU card. The fast image transformation network generated by our meta network is only 449 KB, which is capable of real-time running on a mobile device. We also investigate the manifold of the style transfer networks by operating the hidden features from meta networks. Experiments have well validated the effectiveness of our method. Code and trained models will be released.
本文提出了一种在神经风格迁移的元网络中通过一次前馈传播生成指定网络参数的新方法。目前关于风格迁移的研究通常需要针对每一种新风格训练图像变换网络,并且风格是通过大量的随机梯度下降迭代编码到网络参数中,在推理阶段缺乏对新风格的泛化能力。为了解决这些问题,我们构建了一个元网络,该网络直接接收风格图像并生成相应的图像转换网络。与针对每种风格的基于优化的方法相比,我们的元网络可以在一个现代GPU卡上在19毫秒内处理任意新风格。我们的元网络生成的快速图像变换网络只有449 KB,能够在移动设备上实时运行。我们还通过操作元网络中的隐藏特征来研究风格迁移网络的流形。实验验证了该方法的有效性。代码和训练过的模型将被发布。
{"title":"Neural Style Transfer via Meta Networks","authors":"Falong Shen, Shuicheng Yan, Gang Zeng","doi":"10.1109/CVPR.2018.00841","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00841","url":null,"abstract":"In this paper we propose a noval method to generate the specified network parameters through one feed-forward propagation in the meta networks for neural style transfer. Recent works on style transfer typically need to train image transformation networks for every new style, and the style is encoded in the network parameters by enormous iterations of stochastic gradient descent, which lacks the generalization ability to new style in the inference stage. To tackle these issues, we build a meta network which takes in the style image and generates a corresponding image transformation network directly. Compared with optimization-based methods for every style, our meta networks can handle an arbitrary new style within 19 milliseconds on one modern GPU card. The fast image transformation network generated by our meta network is only 449 KB, which is capable of real-time running on a mobile device. We also investigate the manifold of the style transfer networks by operating the hidden features from meta networks. Experiments have well validated the effectiveness of our method. Code and trained models will be released.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"13 1","pages":"8061-8069"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82966829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 105
Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net 速度与激情:实时端到端3D检测,跟踪和运动预测与单一卷积网络
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00376
Wenjie Luo, Binh Yang, R. Urtasun
In this paper we propose a novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor. By jointly reasoning about these tasks, our holistic approach is more robust to occlusion as well as sparse data at range. Our approach performs 3D convolutions across space and time over a bird's eye view representation of the 3D world, which is very efficient in terms of both memory and computation. Our experiments on a new very large scale dataset captured in several north american cities, show that we can outperform the state-of-the-art by a large margin. Importantly, by sharing computation we can perform all tasks in as little as 30 ms.
在本文中,我们提出了一种新的深度神经网络,它能够根据三维传感器捕获的数据对三维检测、跟踪和运动预测进行联合推理。通过对这些任务的联合推理,我们的整体方法对遮挡和稀疏数据的鲁棒性更强。我们的方法在3D世界的鸟瞰图上执行跨空间和时间的3D卷积,这在内存和计算方面都非常高效。我们在北美几个城市的一个新的大规模数据集上进行的实验表明,我们可以在很大程度上超越最先进的技术。重要的是,通过共享计算,我们可以在短短30毫秒内执行所有任务。
{"title":"Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net","authors":"Wenjie Luo, Binh Yang, R. Urtasun","doi":"10.1109/CVPR.2018.00376","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00376","url":null,"abstract":"In this paper we propose a novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor. By jointly reasoning about these tasks, our holistic approach is more robust to occlusion as well as sparse data at range. Our approach performs 3D convolutions across space and time over a bird's eye view representation of the 3D world, which is very efficient in terms of both memory and computation. Our experiments on a new very large scale dataset captured in several north american cities, show that we can outperform the state-of-the-art by a large margin. Importantly, by sharing computation we can perform all tasks in as little as 30 ms.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"40 1","pages":"3569-3577"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78450607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 540
Duplex Generative Adversarial Network for Unsupervised Domain Adaptation 无监督域自适应的双生成对抗网络
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00162
Lanqing Hu, Meina Kan, S. Shan, Xilin Chen
Domain adaptation attempts to transfer the knowledge obtained from the source domain to the target domain, i.e., the domain where the testing data are. The main challenge lies in the distribution discrepancy between source and target domain. Most existing works endeavor to learn domain invariant representation usually by minimizing a distribution distance, e.g., MMD and the discriminator in the recently proposed generative adversarial network (GAN). Following the similar idea of GAN, this work proposes a novel GAN architecture with duplex adversarial discriminators (referred to as DupGAN), which can achieve domain-invariant representation and domain transformation. Specifically, our proposed network consists of three parts, an encoder, a generator and two discriminators. The encoder embeds samples from both domains into the latent representation, and the generator decodes the latent representation to both source and target domains respectively conditioned on a domain code, i.e., achieves domain transformation. The generator is pitted against duplex discriminators, one for source domain and the other for target, to ensure the reality of domain transformation, the latent representation domain invariant and the category information of it preserved as well. Our proposed work achieves the state-of-the-art performance on unsupervised domain adaptation of digit classification and object recognition.
领域适应试图将从源领域获得的知识转移到目标领域,即测试数据所在的领域。主要的挑战在于源域和目标域之间的分布差异。大多数现有的工作通常是通过最小化分布距离来学习域不变表示,例如MMD和最近提出的生成对抗网络(GAN)中的鉴别器。遵循GAN的类似思想,本文提出了一种具有双对抗性鉴别器的新型GAN架构(称为DupGAN),可以实现域不变表示和域变换。具体来说,我们提出的网络由三部分组成,一个编码器,一个生成器和两个鉴别器。编码器将两个域的样本嵌入到潜在表示中,生成器根据域代码将潜在表示分别解码到源域和目标域,即实现域变换。该生成器与源域和目标域的双鉴别器进行对抗,保证了域变换的真实性,并保留了潜在表示域的不变性和类别信息。我们的工作在数字分类和目标识别的无监督域自适应方面达到了最先进的性能。
{"title":"Duplex Generative Adversarial Network for Unsupervised Domain Adaptation","authors":"Lanqing Hu, Meina Kan, S. Shan, Xilin Chen","doi":"10.1109/CVPR.2018.00162","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00162","url":null,"abstract":"Domain adaptation attempts to transfer the knowledge obtained from the source domain to the target domain, i.e., the domain where the testing data are. The main challenge lies in the distribution discrepancy between source and target domain. Most existing works endeavor to learn domain invariant representation usually by minimizing a distribution distance, e.g., MMD and the discriminator in the recently proposed generative adversarial network (GAN). Following the similar idea of GAN, this work proposes a novel GAN architecture with duplex adversarial discriminators (referred to as DupGAN), which can achieve domain-invariant representation and domain transformation. Specifically, our proposed network consists of three parts, an encoder, a generator and two discriminators. The encoder embeds samples from both domains into the latent representation, and the generator decodes the latent representation to both source and target domains respectively conditioned on a domain code, i.e., achieves domain transformation. The generator is pitted against duplex discriminators, one for source domain and the other for target, to ensure the reality of domain transformation, the latent representation domain invariant and the category information of it preserved as well. Our proposed work achieves the state-of-the-art performance on unsupervised domain adaptation of digit classification and object recognition.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"32 1","pages":"1498-1507"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78544129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 153
Feature Super-Resolution: Make Machine See More Clearly 超分辨率:使机器看得更清楚
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00420
Weimin Tan, Bo Yan, Bahetiyaer Bare
Identifying small size images or small objects is a notoriously challenging problem, as discriminative representations are difficult to learn from the limited information contained in them with poor-quality appearance and unclear object structure. Existing research works usually increase the resolution of low-resolution image in the pixel space in order to provide better visual quality for human viewing. However, the improved performance of such methods is usually limited or even trivial in the case of very small image size (we will show it in this paper explicitly). In this paper, different from image super-resolution (ISR), we propose a novel super-resolution technique called feature super-resolution (FSR), which aims at enhancing the discriminatory power of small size image in order to provide high recognition precision for machine. To achieve this goal, we propose a new Feature Super-Resolution Generative Adversarial Network (FSR-GAN) model that transforms the raw poor features of small size images to highly discriminative ones by performing super-resolution in the feature space. Our FSR-GAN consists of two subnetworks: a feature generator network G and a feature discriminator network D. By training the G and the D networks in an alternative manner, we encourage the G network to discover the latent distribution correlations between small size and large size images and then use G to improve the representations of small images. Extensive experiment results on Oxford5K, Paris, Holidays, and Flick100k datasets demonstrate that the proposed FSR approach can effectively enhance the discriminatory ability of features. Even when the resolution of query images is reduced greatly, e.g., 1/64 original size, the query feature enhanced by our FSR approach achieves surprisingly high retrieval performance at different image resolutions and increases the retrieval precision by 25% compared to the raw query feature.
识别小尺寸的图像或小物体是一个非常具有挑战性的问题,因为鉴别表示很难从它们所包含的有限信息中学习,而且它们的外观质量很差,物体结构不清楚。现有的研究工作通常是在像素空间中提高低分辨率图像的分辨率,以便为人类观看提供更好的视觉质量。然而,在非常小的图像尺寸的情况下,这些方法的改进性能通常是有限的,甚至微不足道(我们将在本文中明确地展示它)。与图像超分辨率(ISR)不同,本文提出了一种新的超分辨率技术——特征超分辨率(FSR),该技术旨在增强小尺寸图像的识别能力,从而为机器提供更高的识别精度。为了实现这一目标,我们提出了一种新的特征超分辨率生成对抗网络(FSR-GAN)模型,该模型通过在特征空间中执行超分辨率,将小尺寸图像的原始差特征转换为高度判别的特征。我们的FSR-GAN由两个子网络组成:一个特征生成器网络G和一个特征鉴别器网络D。通过以另一种方式训练G和D网络,我们鼓励G网络发现小尺寸和大尺寸图像之间的潜在分布相关性,然后使用G来改进小图像的表示。在Oxford5K、Paris、Holidays和Flick100k数据集上的大量实验结果表明,本文提出的FSR方法可以有效地增强特征的区分能力。即使当查询图像的分辨率大大降低时,例如,原始尺寸为1/64,我们的FSR方法增强的查询特征在不同图像分辨率下获得了惊人的高检索性能,与原始查询特征相比,检索精度提高了25%。
{"title":"Feature Super-Resolution: Make Machine See More Clearly","authors":"Weimin Tan, Bo Yan, Bahetiyaer Bare","doi":"10.1109/CVPR.2018.00420","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00420","url":null,"abstract":"Identifying small size images or small objects is a notoriously challenging problem, as discriminative representations are difficult to learn from the limited information contained in them with poor-quality appearance and unclear object structure. Existing research works usually increase the resolution of low-resolution image in the pixel space in order to provide better visual quality for human viewing. However, the improved performance of such methods is usually limited or even trivial in the case of very small image size (we will show it in this paper explicitly). In this paper, different from image super-resolution (ISR), we propose a novel super-resolution technique called feature super-resolution (FSR), which aims at enhancing the discriminatory power of small size image in order to provide high recognition precision for machine. To achieve this goal, we propose a new Feature Super-Resolution Generative Adversarial Network (FSR-GAN) model that transforms the raw poor features of small size images to highly discriminative ones by performing super-resolution in the feature space. Our FSR-GAN consists of two subnetworks: a feature generator network G and a feature discriminator network D. By training the G and the D networks in an alternative manner, we encourage the G network to discover the latent distribution correlations between small size and large size images and then use G to improve the representations of small images. Extensive experiment results on Oxford5K, Paris, Holidays, and Flick100k datasets demonstrate that the proposed FSR approach can effectively enhance the discriminatory ability of features. Even when the resolution of query images is reduced greatly, e.g., 1/64 original size, the query feature enhanced by our FSR approach achieves surprisingly high retrieval performance at different image resolutions and increases the retrieval precision by 25% compared to the raw query feature.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"143 1","pages":"3994-4002"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78680638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Multi-level Fusion Based 3D Object Detection from Monocular Images 基于多层次融合的单目图像三维目标检测
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00249
Bin Xu, Zhenzhong Chen
In this paper, we present an end-to-end multi-level fusion based framework for 3D object detection from a single monocular image. The whole network is composed of two parts: one for 2D region proposal generation and another for simultaneously predictions of objects' 2D locations, orientations, dimensions, and 3D locations. With the help of a stand-alone module to estimate the disparity and compute the 3D point cloud, we introduce the multi-level fusion scheme. First, we encode the disparity information with a front view feature representation and fuse it with the RGB image to enhance the input. Second, features extracted from the original input and the point cloud are combined to boost the object detection. For 3D localization, we introduce an extra stream to predict the location information from point cloud directly and add it to the aforementioned location prediction. The proposed algorithm can directly output both 2D and 3D object detection results in an end-to-end fashion with only a single RGB image as the input. The experimental results on the challenging KITTI benchmark demonstrate that our algorithm significantly outperforms monocular state-of-the-art methods.
在本文中,我们提出了一个基于端到端多级融合的框架,用于从单眼图像中检测3D目标。整个网络由两部分组成:一部分用于二维区域建议生成,另一部分用于同时预测物体的二维位置、方向、尺寸和三维位置。利用独立的视差估计模块和三维点云计算模块,引入了多层次融合方案。首先,我们用前视特征表示编码视差信息,并将其与RGB图像融合以增强输入;其次,将从原始输入中提取的特征与点云相结合,增强目标检测能力;对于3D定位,我们引入了一个额外的流来直接预测点云的位置信息,并将其添加到前面的位置预测中。该算法只需要一张RGB图像作为输入,就可以以端到端的方式直接输出2D和3D目标检测结果。在具有挑战性的KITTI基准上的实验结果表明,我们的算法明显优于单目最先进的方法。
{"title":"Multi-level Fusion Based 3D Object Detection from Monocular Images","authors":"Bin Xu, Zhenzhong Chen","doi":"10.1109/CVPR.2018.00249","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00249","url":null,"abstract":"In this paper, we present an end-to-end multi-level fusion based framework for 3D object detection from a single monocular image. The whole network is composed of two parts: one for 2D region proposal generation and another for simultaneously predictions of objects' 2D locations, orientations, dimensions, and 3D locations. With the help of a stand-alone module to estimate the disparity and compute the 3D point cloud, we introduce the multi-level fusion scheme. First, we encode the disparity information with a front view feature representation and fuse it with the RGB image to enhance the input. Second, features extracted from the original input and the point cloud are combined to boost the object detection. For 3D localization, we introduce an extra stream to predict the location information from point cloud directly and add it to the aforementioned location prediction. The proposed algorithm can directly output both 2D and 3D object detection results in an end-to-end fashion with only a single RGB image as the input. The experimental results on the challenging KITTI benchmark demonstrate that our algorithm significantly outperforms monocular state-of-the-art methods.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"30 1","pages":"2345-2353"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84166953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 260
A Fast Resection-Intersection Method for the Known Rotation Problem 已知旋转问题的快速剖交法
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00318
Qianggong Zhang, Tat-Jun Chin, Huu Le
The known rotation problem refers to a special case of structure-from-motion where the absolute orientations of the cameras are known. When formulated as a minimax ($$) problem on reprojection errors, the problem is an instance of pseudo-convex programming. Though theoretically tractable, solving the known rotation problem on large-scale data (1,000's of views, 10,000's scene points) using existing methods can be very time-consuming. In this paper, we devise a fast algorithm for the known rotation problem. Our approach alternates between pose estimation and triangulation (i.e., resection-intersection) to break the problem into multiple simpler instances of pseudo-convex programming. The key to the vastly superior performance of our method lies in using a novel minimum enclosing ball (MEB) technique for the calculation of updating steps, which obviates the need for convex optimisation routines and greatly reduces memory footprint. We demonstrate the practicality of our method on large-scale problem instances which easily overwhelm current state-of-the-art algorithms.
已知旋转问题是指一种特殊情况的结构从运动,其中绝对方向的相机是已知的。当将其表述为关于重投影误差的极大极小问题时,该问题就是伪凸规划的一个实例。虽然理论上可以处理,但使用现有方法解决大规模数据(1,000个视图,10,000个场景点)上已知的旋转问题可能非常耗时。本文针对已知旋转问题设计了一种快速算法。我们的方法在姿态估计和三角剖分(即,分割-相交)之间交替进行,将问题分解为多个更简单的伪凸规划实例。我们的方法性能优越的关键在于使用了一种新颖的最小封闭球(MEB)技术来计算更新步骤,这避免了对凸优化例程的需要,并大大减少了内存占用。我们证明了我们的方法在大规模问题实例上的实用性,这些实例很容易压倒当前最先进的算法。
{"title":"A Fast Resection-Intersection Method for the Known Rotation Problem","authors":"Qianggong Zhang, Tat-Jun Chin, Huu Le","doi":"10.1109/CVPR.2018.00318","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00318","url":null,"abstract":"The known rotation problem refers to a special case of structure-from-motion where the absolute orientations of the cameras are known. When formulated as a minimax ($$) problem on reprojection errors, the problem is an instance of pseudo-convex programming. Though theoretically tractable, solving the known rotation problem on large-scale data (1,000's of views, 10,000's scene points) using existing methods can be very time-consuming. In this paper, we devise a fast algorithm for the known rotation problem. Our approach alternates between pose estimation and triangulation (i.e., resection-intersection) to break the problem into multiple simpler instances of pseudo-convex programming. The key to the vastly superior performance of our method lies in using a novel minimum enclosing ball (MEB) technique for the calculation of updating steps, which obviates the need for convex optimisation routines and greatly reduces memory footprint. We demonstrate the practicality of our method on large-scale problem instances which easily overwhelm current state-of-the-art algorithms.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"46 1","pages":"3012-3021"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84141661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos PoseFlow:用于理解视频中人类行为的深度运动表示
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00707
Dingwen Zhang, Guangyu Guo, Dong Huang, Junwei Han
Motion of the human body is the critical cue for understanding and characterizing human behavior in videos. Most existing approaches explore the motion cue using optical flows. However, optical flow usually contains motion on both the interested human bodies and the undesired background. This "noisy" motion representation makes it very challenging for pose estimation and action recognition in real scenarios. To address this issue, this paper presents a novel deep motion representation, called PoseFlow, which reveals human motion in videos while suppressing background and motion blur, and being robust to occlusion. For learning PoseFlow with mild computational cost, we propose a functionally structured spatial-temporal deep network, PoseFlow Net (PFN), to jointly solve the skeleton localization and matching problems of PoseFlow. Comprehensive experiments show that PFN outperforms the state-of-the-art deep flow estimation models in generating PoseFlow. Moreover, PoseFlow demonstrates its potential on improving two challenging tasks in human video analysis: pose estimation and action recognition.
在视频中,人体的运动是理解和刻画人类行为的关键线索。大多数现有的方法使用光流来探索运动线索。然而,光流通常包含感兴趣的人体和不希望的背景上的运动。这种“嘈杂”的运动表示使得真实场景中的姿态估计和动作识别非常具有挑战性。为了解决这个问题,本文提出了一种新的深度运动表示,称为PoseFlow,它在抑制背景和运动模糊的同时显示视频中的人体运动,并且对遮挡具有鲁棒性。为了以较小的计算成本学习PoseFlow,我们提出了一种功能结构化的时空深度网络——PoseFlow Net (PFN),共同解决PoseFlow的骨架定位和匹配问题。综合实验表明,PFN在生成PoseFlow方面优于最先进的深流估计模型。此外,PoseFlow展示了其在改进人类视频分析中两个具有挑战性的任务方面的潜力:姿势估计和动作识别。
{"title":"PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos","authors":"Dingwen Zhang, Guangyu Guo, Dong Huang, Junwei Han","doi":"10.1109/CVPR.2018.00707","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00707","url":null,"abstract":"Motion of the human body is the critical cue for understanding and characterizing human behavior in videos. Most existing approaches explore the motion cue using optical flows. However, optical flow usually contains motion on both the interested human bodies and the undesired background. This \"noisy\" motion representation makes it very challenging for pose estimation and action recognition in real scenarios. To address this issue, this paper presents a novel deep motion representation, called PoseFlow, which reveals human motion in videos while suppressing background and motion blur, and being robust to occlusion. For learning PoseFlow with mild computational cost, we propose a functionally structured spatial-temporal deep network, PoseFlow Net (PFN), to jointly solve the skeleton localization and matching problems of PoseFlow. Comprehensive experiments show that PFN outperforms the state-of-the-art deep flow estimation models in generating PoseFlow. Moreover, PoseFlow demonstrates its potential on improving two challenging tasks in human video analysis: pose estimation and action recognition.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"13 1","pages":"6762-6770"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84710834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks 低比特深度神经网络的显式损失误差感知量化
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00982
Aojun Zhou, Anbang Yao, Kuan Wang, Yurong Chen
Benefiting from tens of millions of hierarchically stacked learnable parameters, Deep Neural Networks (DNNs) have demonstrated overwhelming accuracy on a variety of artificial intelligence tasks. However reversely, the large size of DNN models lays a heavy burden on storage, computation and power consumption, which prohibits their deployments on the embedded and mobile systems. In this paper, we propose Explicit Loss-error-aware Quantization (ELQ), a new method that can train DNN models with very low-bit parameter values such as ternary and binary ones to approximate 32-bit floating-point counterparts without noticeable loss of predication accuracy. Unlike existing methods that usually pose the problem as a straightforward approximation of the layer-wise weights or outputs of the original full-precision model (specifically, minimizing the error of the layer-wise weights or inner products of the weights and the inputs between the original and respective quantized models), our ELQ elaborately bridges the loss perturbation from the weight quantization and an incremental quantization strategy to address DNN quantization. Through explicitly regularizing the loss perturbation and the weight approximation error in an incremental way, we show that such a new optimization method is theoretically reasonable and practically effective. As validated with two mainstream convolutional neural network families (i.e., fully convolutional and non-fully convolutional), our ELQ shows better results than state-of-the-art quantization methods on the large scale ImageNet classification dataset. Code will be made publicly available.
得益于数以千万计的分层堆叠的可学习参数,深度神经网络(dnn)在各种人工智能任务中表现出了压倒性的准确性。但反过来,DNN模型的大尺寸给存储、计算和功耗带来了沉重的负担,阻碍了其在嵌入式和移动系统上的部署。在本文中,我们提出了显式损失误差感知量化(Explicit loss -error-aware Quantization, ELQ),这是一种新的方法,可以训练具有非常低比特参数值(如三元和二进制)的DNN模型来近似32位浮点值,而不会明显损失预测精度。与现有方法不同,现有方法通常将问题作为原始全精度模型的分层权重或输出的直接近似(具体而言,最小化分层权重的误差或权重的内积以及原始模型和各自量化模型之间的输入),我们的ELQ精心地将来自权重量化和增量量化策略的损失扰动连接起来,以解决深度神经网络量化问题。通过对损失摄动和权值逼近误差的增量显式正则化,证明了这种优化方法在理论上是合理的,在实际应用中是有效的。通过两种主流卷积神经网络家族(即完全卷积和非完全卷积)的验证,我们的ELQ在大规模ImageNet分类数据集上显示出比最先进的量化方法更好的结果。代码将公开提供。
{"title":"Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks","authors":"Aojun Zhou, Anbang Yao, Kuan Wang, Yurong Chen","doi":"10.1109/CVPR.2018.00982","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00982","url":null,"abstract":"Benefiting from tens of millions of hierarchically stacked learnable parameters, Deep Neural Networks (DNNs) have demonstrated overwhelming accuracy on a variety of artificial intelligence tasks. However reversely, the large size of DNN models lays a heavy burden on storage, computation and power consumption, which prohibits their deployments on the embedded and mobile systems. In this paper, we propose Explicit Loss-error-aware Quantization (ELQ), a new method that can train DNN models with very low-bit parameter values such as ternary and binary ones to approximate 32-bit floating-point counterparts without noticeable loss of predication accuracy. Unlike existing methods that usually pose the problem as a straightforward approximation of the layer-wise weights or outputs of the original full-precision model (specifically, minimizing the error of the layer-wise weights or inner products of the weights and the inputs between the original and respective quantized models), our ELQ elaborately bridges the loss perturbation from the weight quantization and an incremental quantization strategy to address DNN quantization. Through explicitly regularizing the loss perturbation and the weight approximation error in an incremental way, we show that such a new optimization method is theoretically reasonable and practically effective. As validated with two mainstream convolutional neural network families (i.e., fully convolutional and non-fully convolutional), our ELQ shows better results than state-of-the-art quantization methods on the large scale ImageNet classification dataset. Code will be made publicly available.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"46 1","pages":"9426-9435"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88204691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
A Common Framework for Interactive Texture Transfer 交互式纹理传输的通用框架
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00665
Yifang Men, Z. Lian, Yingmin Tang, Jianguo Xiao
In this paper, we present a general-purpose solution to interactive texture transfer problems that better preserves both local structure and visual richness. It is challenging due to the diversity of tasks and the simplicity of required user guidance. The core idea of our common framework is to use multiple custom channels to dynamically guide the synthesis process. For interactivity, users can control the spatial distribution of stylized textures via semantic channels. The structure guidance, acquired by two stages of automatic extraction and propagation of structure information, provides a prior for initialization and preserves the salient structure by searching the nearest neighbor fields (NNF) with structure coherence. Meanwhile, texture coherence is also exploited to maintain similar style with the source image. In addition, we leverage an improved PatchMatch with extended NNF and matrix operations to obtain transformable source patches with richer geometric information at high speed. We demonstrate the effectiveness and superiority of our method on a variety of scenes through extensive comparisons with state-of-the-art algorithms.
在本文中,我们提出了一个通用的解决方案,以更好地保留局部结构和视觉丰富性的交互式纹理传输问题。由于任务的多样性和所需用户指导的简单性,这是具有挑战性的。我们的通用框架的核心思想是使用多个自定义通道来动态地指导合成过程。在交互性方面,用户可以通过语义通道控制风格化纹理的空间分布。通过结构信息的自动提取和传播两阶段获得结构引导,为初始化提供先验,并通过搜索具有结构相干性的最近邻域(NNF)来保留显著结构。同时,利用纹理一致性来保持与源图像的相似风格。此外,我们利用改进的PatchMatch扩展了NNF和矩阵运算,以高速获得具有更丰富几何信息的可转换源补丁。通过与最先进的算法进行广泛的比较,我们证明了我们的方法在各种场景中的有效性和优越性。
{"title":"A Common Framework for Interactive Texture Transfer","authors":"Yifang Men, Z. Lian, Yingmin Tang, Jianguo Xiao","doi":"10.1109/CVPR.2018.00665","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00665","url":null,"abstract":"In this paper, we present a general-purpose solution to interactive texture transfer problems that better preserves both local structure and visual richness. It is challenging due to the diversity of tasks and the simplicity of required user guidance. The core idea of our common framework is to use multiple custom channels to dynamically guide the synthesis process. For interactivity, users can control the spatial distribution of stylized textures via semantic channels. The structure guidance, acquired by two stages of automatic extraction and propagation of structure information, provides a prior for initialization and preserves the salient structure by searching the nearest neighbor fields (NNF) with structure coherence. Meanwhile, texture coherence is also exploited to maintain similar style with the source image. In addition, we leverage an improved PatchMatch with extended NNF and matrix operations to obtain transformable source patches with richer geometric information at high speed. We demonstrate the effectiveness and superiority of our method on a variety of scenes through extensive comparisons with state-of-the-art algorithms.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"53 1","pages":"6353-6362"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90954518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Salience Guided Depth Calibration for Perceptually Optimized Compressive Light Field 3D Display 感知优化压缩光场三维显示的显著性引导深度校准
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00217
Shizheng Wang, Wenjuan Liao, P. Surman, Zhigang Tu, Yuanjin Zheng, Junsong Yuan
Multi-layer light field displays are a type of computational three-dimensional (3D) display which has recently gained increasing interest for its holographic-like effect and natural compatibility with 2D displays. However, the major shortcoming, depth limitation, still cannot be overcome in the traditional light field modeling and reconstruction based on multi-layer liquid crystal displays (LCDs). Considering this disadvantage, our paper incorporates a salience guided depth optimization over a limited display range to calibrate the displayed depth and present the maximum area of salience region for multi-layer light field display. Different from previously reported cascaded light field displays that use the fixed initialization plane as the depth center of display content, our method automatically calibrates the depth initialization based on the salience results derived from the proposed contrast enhanced salience detection method. Experiments demonstrate that the proposed method provides a promising advantage in visual perception for the compressive light field displays from both software simulation and prototype demonstration.
多层光场显示器是一种计算三维(3D)显示器,近年来因其类似全息的效果和与二维显示器的自然兼容性而受到越来越多的关注。然而,基于多层液晶显示器(lcd)的传统光场建模和重建仍然无法克服深度限制这一主要缺点。考虑到这一缺点,本文结合了在有限显示范围内的显著性引导深度优化来校准显示深度,并为多层光场显示提供最大显著区域面积。与以往报道的使用固定初始化平面作为显示内容深度中心的级联光场显示不同,本文方法基于对比度增强显著性检测方法得出的显著性结果自动校准深度初始化。实验结果表明,该方法在压缩光场显示的视觉感知方面具有很好的优势。
{"title":"Salience Guided Depth Calibration for Perceptually Optimized Compressive Light Field 3D Display","authors":"Shizheng Wang, Wenjuan Liao, P. Surman, Zhigang Tu, Yuanjin Zheng, Junsong Yuan","doi":"10.1109/CVPR.2018.00217","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00217","url":null,"abstract":"Multi-layer light field displays are a type of computational three-dimensional (3D) display which has recently gained increasing interest for its holographic-like effect and natural compatibility with 2D displays. However, the major shortcoming, depth limitation, still cannot be overcome in the traditional light field modeling and reconstruction based on multi-layer liquid crystal displays (LCDs). Considering this disadvantage, our paper incorporates a salience guided depth optimization over a limited display range to calibrate the displayed depth and present the maximum area of salience region for multi-layer light field display. Different from previously reported cascaded light field displays that use the fixed initialization plane as the depth center of display content, our method automatically calibrates the depth initialization based on the salience results derived from the proposed contrast enhanced salience detection method. Experiments demonstrate that the proposed method provides a promising advantage in visual perception for the compressive light field displays from both software simulation and prototype demonstration.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"62 1","pages":"2031-2040"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77902099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1