首页 > 最新文献

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition最新文献

英文 中文
Modeling Facial Geometry Using Compositional VAEs 使用合成VAEs建模面部几何
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00408
Timur M. Bagautdinov, Chenglei Wu, Jason M. Saragih, P. Fua, Yaser Sheikh
We propose a method for learning non-linear face geometry representations using deep generative models. Our model is a variational autoencoder with multiple levels of hidden variables where lower layers capture global geometry and higher ones encode more local deformations. Based on that, we propose a new parameterization of facial geometry that naturally decomposes the structure of the human face into a set of semantically meaningful levels of detail. This parameterization enables us to do model fitting while capturing varying level of detail under different types of geometrical constraints.
我们提出了一种使用深度生成模型学习非线性面部几何表示的方法。我们的模型是一个具有多层隐藏变量的变分自编码器,其中较低的层捕获全局几何形状,较高的层编码更多的局部变形。在此基础上,我们提出了一种新的面部几何参数化方法,该方法将人脸结构自然地分解为一组语义上有意义的细节层次。这种参数化使我们能够进行模型拟合,同时在不同类型的几何约束下捕获不同级别的细节。
{"title":"Modeling Facial Geometry Using Compositional VAEs","authors":"Timur M. Bagautdinov, Chenglei Wu, Jason M. Saragih, P. Fua, Yaser Sheikh","doi":"10.1109/CVPR.2018.00408","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00408","url":null,"abstract":"We propose a method for learning non-linear face geometry representations using deep generative models. Our model is a variational autoencoder with multiple levels of hidden variables where lower layers capture global geometry and higher ones encode more local deformations. Based on that, we propose a new parameterization of facial geometry that naturally decomposes the structure of the human face into a set of semantically meaningful levels of detail. This parameterization enables us to do model fitting while capturing varying level of detail under different types of geometrical constraints.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"34 1","pages":"3877-3886"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87295144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation 无显式运动补偿的动态上采样滤波器深度视频超分辨率网络
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00340
Younghyun Jo, Seoung Wug Oh, Jaeyeon Kang, Seon Joo Kim
Video super-resolution (VSR) has become even more important recently to provide high resolution (HR) contents for ultra high definition displays. While many deep learning based VSR methods have been proposed, most of them rely heavily on the accuracy of motion estimation and compensation. We introduce a fundamentally different framework for VSR in this paper. We propose a novel end-to-end deep neural network that generates dynamic upsampling filters and a residual image, which are computed depending on the local spatio-temporal neighborhood of each pixel to avoid explicit motion compensation. With our approach, an HR image is reconstructed directly from the input image using the dynamic upsampling filters, and the fine details are added through the computed residual. Our network with the help of a new data augmentation technique can generate much sharper HR videos with temporal consistency, compared with the previous methods. We also provide analysis of our network through extensive experiments to show how the network deals with motions implicitly.
最近,视频超分辨率(VSR)在为超高清显示器提供高分辨率(HR)内容方面变得更加重要。虽然已经提出了许多基于深度学习的VSR方法,但大多数方法都严重依赖于运动估计和补偿的准确性。我们在本文中介绍了一个完全不同的VSR框架。我们提出了一种新的端到端深度神经网络,该网络生成动态上采样滤波器和残差图像,残差图像根据每个像素的局部时空邻域计算,以避免显式的运动补偿。利用我们的方法,使用动态上采样滤波器直接从输入图像重建HR图像,并通过计算残差添加精细细节。与以前的方法相比,我们的网络在新的数据增强技术的帮助下可以生成更清晰的HR视频,并且具有时间一致性。我们还通过大量的实验对我们的网络进行了分析,以展示网络如何隐式地处理运动。
{"title":"Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation","authors":"Younghyun Jo, Seoung Wug Oh, Jaeyeon Kang, Seon Joo Kim","doi":"10.1109/CVPR.2018.00340","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00340","url":null,"abstract":"Video super-resolution (VSR) has become even more important recently to provide high resolution (HR) contents for ultra high definition displays. While many deep learning based VSR methods have been proposed, most of them rely heavily on the accuracy of motion estimation and compensation. We introduce a fundamentally different framework for VSR in this paper. We propose a novel end-to-end deep neural network that generates dynamic upsampling filters and a residual image, which are computed depending on the local spatio-temporal neighborhood of each pixel to avoid explicit motion compensation. With our approach, an HR image is reconstructed directly from the input image using the dynamic upsampling filters, and the fine details are added through the computed residual. Our network with the help of a new data augmentation technique can generate much sharper HR videos with temporal consistency, compared with the previous methods. We also provide analysis of our network through extensive experiments to show how the network deals with motions implicitly.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"19 1","pages":"3224-3232"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90162537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 439
Occlusion-Aware Rolling Shutter Rectification of 3D Scenes 三维场景的闭塞感知卷帘门校正
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00073
Subeesh Vasu, R. MaheshMohanM., A. Rajagopalan
A vast majority of contemporary cameras employ rolling shutter (RS) mechanism to capture images. Due to the sequential mechanism, images acquired with a moving camera are subjected to rolling shutter effect which manifests as geometric distortions. In this work, we consider the specific scenario of a fast moving camera wherein the rolling shutter distortions not only are predominant but also become depth-dependent which in turn results in intra-frame occlusions. To this end, we develop a first-of-its-kind pipeline to recover the latent image of a 3D scene from a set of such RS distorted images. The proposed approach sequentially recovers both the camera motion and scene structure while accounting for RS and occlusion effects. Subsequently, we perform depth and occlusion-aware rectification of RS images to yield the desired latent image. Our experiments on synthetic and real image sequences reveal that the proposed approach achieves state-of-the-art results.
绝大多数当代相机采用滚动快门(RS)机制来捕捉图像。由于时序机制,运动相机所获得的图像受到滚动快门效应的影响,表现为几何畸变。在这项工作中,我们考虑了快速移动相机的特定场景,其中滚动快门失真不仅占主导地位,而且还变得依赖于深度,从而导致帧内遮挡。为此,我们开发了一种首创的管道来从一组这样的RS扭曲图像中恢复3D场景的潜在图像。该方法在考虑RS和遮挡效应的情况下,依次恢复摄像机运动和场景结构。随后,我们对RS图像进行深度和闭塞感知校正,以产生所需的潜在图像。我们在合成和真实图像序列上的实验表明,所提出的方法达到了最先进的结果。
{"title":"Occlusion-Aware Rolling Shutter Rectification of 3D Scenes","authors":"Subeesh Vasu, R. MaheshMohanM., A. Rajagopalan","doi":"10.1109/CVPR.2018.00073","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00073","url":null,"abstract":"A vast majority of contemporary cameras employ rolling shutter (RS) mechanism to capture images. Due to the sequential mechanism, images acquired with a moving camera are subjected to rolling shutter effect which manifests as geometric distortions. In this work, we consider the specific scenario of a fast moving camera wherein the rolling shutter distortions not only are predominant but also become depth-dependent which in turn results in intra-frame occlusions. To this end, we develop a first-of-its-kind pipeline to recover the latent image of a 3D scene from a set of such RS distorted images. The proposed approach sequentially recovers both the camera motion and scene structure while accounting for RS and occlusion effects. Subsequently, we perform depth and occlusion-aware rectification of RS images to yield the desired latent image. Our experiments on synthetic and real image sequences reveal that the proposed approach achieves state-of-the-art results.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"67 1","pages":"636-645"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90374398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Egocentric Activity Recognition on a Budget 预算中的自我中心活动识别
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00625
Rafael Possas, Sheila M. Pinto-Caceres, F. Ramos
Recent advances in embedded technology have enabled more pervasive machine learning. One of the common applications in this field is Egocentric Activity Recognition (EAR), where users wearing a device such as a smartphone or smartglasses are able to receive feedback from the embedded device. Recent research on activity recognition has mainly focused on improving accuracy by using resource intensive techniques such as multi-stream deep networks. Although this approach has provided state-of-the-art results, in most cases it neglects the natural resource constraints (e.g. battery) of wearable devices. We develop a Reinforcement Learning model-free method to learn energy-aware policies that maximize the use of low-energy cost predictors while keeping competitive accuracy levels. Our results show that a policy trained on an egocentric dataset is able use the synergy between motion and vision sensors to effectively tradeoff energy expenditure and accuracy on smartglasses operating in realistic, real-world conditions.
嵌入式技术的最新进展使机器学习更加普及。该领域的一个常见应用是自我中心活动识别(EAR),用户戴着智能手机或智能眼镜等设备,能够接收来自嵌入式设备的反馈。近年来对活动识别的研究主要集中在利用多流深度网络等资源密集型技术来提高识别精度。尽管这种方法提供了最先进的结果,但在大多数情况下,它忽略了可穿戴设备的自然资源限制(例如电池)。我们开发了一种无模型的强化学习方法来学习能源感知策略,最大限度地利用低能源成本预测器,同时保持具有竞争力的准确性水平。我们的研究结果表明,在以自我为中心的数据集上训练的策略能够利用运动和视觉传感器之间的协同作用,有效地权衡智能眼镜在现实世界条件下的能量消耗和准确性。
{"title":"Egocentric Activity Recognition on a Budget","authors":"Rafael Possas, Sheila M. Pinto-Caceres, F. Ramos","doi":"10.1109/CVPR.2018.00625","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00625","url":null,"abstract":"Recent advances in embedded technology have enabled more pervasive machine learning. One of the common applications in this field is Egocentric Activity Recognition (EAR), where users wearing a device such as a smartphone or smartglasses are able to receive feedback from the embedded device. Recent research on activity recognition has mainly focused on improving accuracy by using resource intensive techniques such as multi-stream deep networks. Although this approach has provided state-of-the-art results, in most cases it neglects the natural resource constraints (e.g. battery) of wearable devices. We develop a Reinforcement Learning model-free method to learn energy-aware policies that maximize the use of low-energy cost predictors while keeping competitive accuracy levels. Our results show that a policy trained on an egocentric dataset is able use the synergy between motion and vision sensors to effectively tradeoff energy expenditure and accuracy on smartglasses operating in realistic, real-world conditions.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"20 1","pages":"5967-5976"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73480781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials RayNet:学习用射线势进行体积三维重建
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00410
Despoina Paschalidou, Ali O. Ulusoy, Carolin Schmitt, L. Gool, Andreas Geiger
In this paper, we consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches.
在本文中,我们考虑了使用从不同视图捕获的图像重建密集三维模型的问题。最近基于卷积神经网络(CNN)的方法允许从数据中学习整个任务。然而,它们没有纳入图像形成的物理,如透视几何和遮挡。相反,基于具有射线势的马尔可夫随机场(MRF)的经典方法明确地模拟了这些物理过程,但它们无法处理不同视点之间的大表面外观变化。在本文中,我们提出了RayNet,它结合了两个框架的优势。RayNet集成了一个学习视图不变特征表示的CNN和一个明确编码透视投影和遮挡物理的MRF。我们使用经验风险最小化对RayNet进行端到端培训。我们在具有挑战性的真实世界数据集上彻底评估了我们的方法,并展示了其优于分段训练基线、手工制作模型以及其他基于学习的方法的优点。
{"title":"RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials","authors":"Despoina Paschalidou, Ali O. Ulusoy, Carolin Schmitt, L. Gool, Andreas Geiger","doi":"10.1109/CVPR.2018.00410","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00410","url":null,"abstract":"In this paper, we consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"52 1","pages":"3897-3906"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89074136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos PoseFlow:用于理解视频中人类行为的深度运动表示
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00707
Dingwen Zhang, Guangyu Guo, Dong Huang, Junwei Han
Motion of the human body is the critical cue for understanding and characterizing human behavior in videos. Most existing approaches explore the motion cue using optical flows. However, optical flow usually contains motion on both the interested human bodies and the undesired background. This "noisy" motion representation makes it very challenging for pose estimation and action recognition in real scenarios. To address this issue, this paper presents a novel deep motion representation, called PoseFlow, which reveals human motion in videos while suppressing background and motion blur, and being robust to occlusion. For learning PoseFlow with mild computational cost, we propose a functionally structured spatial-temporal deep network, PoseFlow Net (PFN), to jointly solve the skeleton localization and matching problems of PoseFlow. Comprehensive experiments show that PFN outperforms the state-of-the-art deep flow estimation models in generating PoseFlow. Moreover, PoseFlow demonstrates its potential on improving two challenging tasks in human video analysis: pose estimation and action recognition.
在视频中,人体的运动是理解和刻画人类行为的关键线索。大多数现有的方法使用光流来探索运动线索。然而,光流通常包含感兴趣的人体和不希望的背景上的运动。这种“嘈杂”的运动表示使得真实场景中的姿态估计和动作识别非常具有挑战性。为了解决这个问题,本文提出了一种新的深度运动表示,称为PoseFlow,它在抑制背景和运动模糊的同时显示视频中的人体运动,并且对遮挡具有鲁棒性。为了以较小的计算成本学习PoseFlow,我们提出了一种功能结构化的时空深度网络——PoseFlow Net (PFN),共同解决PoseFlow的骨架定位和匹配问题。综合实验表明,PFN在生成PoseFlow方面优于最先进的深流估计模型。此外,PoseFlow展示了其在改进人类视频分析中两个具有挑战性的任务方面的潜力:姿势估计和动作识别。
{"title":"PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos","authors":"Dingwen Zhang, Guangyu Guo, Dong Huang, Junwei Han","doi":"10.1109/CVPR.2018.00707","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00707","url":null,"abstract":"Motion of the human body is the critical cue for understanding and characterizing human behavior in videos. Most existing approaches explore the motion cue using optical flows. However, optical flow usually contains motion on both the interested human bodies and the undesired background. This \"noisy\" motion representation makes it very challenging for pose estimation and action recognition in real scenarios. To address this issue, this paper presents a novel deep motion representation, called PoseFlow, which reveals human motion in videos while suppressing background and motion blur, and being robust to occlusion. For learning PoseFlow with mild computational cost, we propose a functionally structured spatial-temporal deep network, PoseFlow Net (PFN), to jointly solve the skeleton localization and matching problems of PoseFlow. Comprehensive experiments show that PFN outperforms the state-of-the-art deep flow estimation models in generating PoseFlow. Moreover, PoseFlow demonstrates its potential on improving two challenging tasks in human video analysis: pose estimation and action recognition.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"13 1","pages":"6762-6770"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84710834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks 低比特深度神经网络的显式损失误差感知量化
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00982
Aojun Zhou, Anbang Yao, Kuan Wang, Yurong Chen
Benefiting from tens of millions of hierarchically stacked learnable parameters, Deep Neural Networks (DNNs) have demonstrated overwhelming accuracy on a variety of artificial intelligence tasks. However reversely, the large size of DNN models lays a heavy burden on storage, computation and power consumption, which prohibits their deployments on the embedded and mobile systems. In this paper, we propose Explicit Loss-error-aware Quantization (ELQ), a new method that can train DNN models with very low-bit parameter values such as ternary and binary ones to approximate 32-bit floating-point counterparts without noticeable loss of predication accuracy. Unlike existing methods that usually pose the problem as a straightforward approximation of the layer-wise weights or outputs of the original full-precision model (specifically, minimizing the error of the layer-wise weights or inner products of the weights and the inputs between the original and respective quantized models), our ELQ elaborately bridges the loss perturbation from the weight quantization and an incremental quantization strategy to address DNN quantization. Through explicitly regularizing the loss perturbation and the weight approximation error in an incremental way, we show that such a new optimization method is theoretically reasonable and practically effective. As validated with two mainstream convolutional neural network families (i.e., fully convolutional and non-fully convolutional), our ELQ shows better results than state-of-the-art quantization methods on the large scale ImageNet classification dataset. Code will be made publicly available.
得益于数以千万计的分层堆叠的可学习参数,深度神经网络(dnn)在各种人工智能任务中表现出了压倒性的准确性。但反过来,DNN模型的大尺寸给存储、计算和功耗带来了沉重的负担,阻碍了其在嵌入式和移动系统上的部署。在本文中,我们提出了显式损失误差感知量化(Explicit loss -error-aware Quantization, ELQ),这是一种新的方法,可以训练具有非常低比特参数值(如三元和二进制)的DNN模型来近似32位浮点值,而不会明显损失预测精度。与现有方法不同,现有方法通常将问题作为原始全精度模型的分层权重或输出的直接近似(具体而言,最小化分层权重的误差或权重的内积以及原始模型和各自量化模型之间的输入),我们的ELQ精心地将来自权重量化和增量量化策略的损失扰动连接起来,以解决深度神经网络量化问题。通过对损失摄动和权值逼近误差的增量显式正则化,证明了这种优化方法在理论上是合理的,在实际应用中是有效的。通过两种主流卷积神经网络家族(即完全卷积和非完全卷积)的验证,我们的ELQ在大规模ImageNet分类数据集上显示出比最先进的量化方法更好的结果。代码将公开提供。
{"title":"Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks","authors":"Aojun Zhou, Anbang Yao, Kuan Wang, Yurong Chen","doi":"10.1109/CVPR.2018.00982","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00982","url":null,"abstract":"Benefiting from tens of millions of hierarchically stacked learnable parameters, Deep Neural Networks (DNNs) have demonstrated overwhelming accuracy on a variety of artificial intelligence tasks. However reversely, the large size of DNN models lays a heavy burden on storage, computation and power consumption, which prohibits their deployments on the embedded and mobile systems. In this paper, we propose Explicit Loss-error-aware Quantization (ELQ), a new method that can train DNN models with very low-bit parameter values such as ternary and binary ones to approximate 32-bit floating-point counterparts without noticeable loss of predication accuracy. Unlike existing methods that usually pose the problem as a straightforward approximation of the layer-wise weights or outputs of the original full-precision model (specifically, minimizing the error of the layer-wise weights or inner products of the weights and the inputs between the original and respective quantized models), our ELQ elaborately bridges the loss perturbation from the weight quantization and an incremental quantization strategy to address DNN quantization. Through explicitly regularizing the loss perturbation and the weight approximation error in an incremental way, we show that such a new optimization method is theoretically reasonable and practically effective. As validated with two mainstream convolutional neural network families (i.e., fully convolutional and non-fully convolutional), our ELQ shows better results than state-of-the-art quantization methods on the large scale ImageNet classification dataset. Code will be made publicly available.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"46 1","pages":"9426-9435"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88204691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
SINT++: Robust Visual Tracking via Adversarial Positive Instance Generation 基于对抗性正面实例生成的稳健视觉跟踪
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00511
Xiao Wang, Chenglong Li, B. Luo, Jin Tang
Existing visual trackers are easily disturbed by occlusion, blur and large deformation. We think the performance of existing visual trackers may be limited due to the following issues: i) Adopting the dense sampling strategy to generate positive examples will make them less diverse; ii) The training data with different challenging factors are limited, even through collecting large training dataset. Collecting even larger training dataset is the most intuitive paradigm, but it may still can not cover all situations and the positive samples are still monotonous. In this paper, we propose to generate hard positive samples via adversarial learning for visual tracking. Specifically speaking, we assume the target objects all lie on a manifold, hence, we introduce the positive samples generation network (PSGN) to sampling massive diverse training data through traversing over the constructed target object manifold. The generated diverse target object images can enrich the training dataset and enhance the robustness of visual trackers. To make the tracker more robust to occlusion, we adopt the hard positive transformation network (HPTN) which can generate hard samples for tracking algorithm to recognize. We train this network with deep reinforcement learning to automatically occlude the target object with a negative patch. Based on the generated hard positive samples, we train a Siamese network for visual tracking and our experiments validate the effectiveness of the introduced algorithm. The project page of this paper can be found from the website1.
现有的视觉跟踪器容易受到遮挡、模糊和大变形的干扰。我们认为现有的视觉跟踪器的性能可能会受到以下问题的限制:i)采用密集采样策略生成正例会使它们的多样性降低;ii)具有不同挑战性因素的训练数据是有限的,即使通过收集大型训练数据集。收集更大的训练数据集是最直观的范例,但它可能仍然不能覆盖所有的情况,并且正样本仍然是单调的。在本文中,我们提出通过对抗性学习生成硬正样本用于视觉跟踪。具体来说,我们假设目标对象都在流形上,因此,我们引入正样本生成网络(PSGN),通过遍历构建的目标对象流形来采样大量不同的训练数据。生成的多样化目标图像可以丰富训练数据集,增强视觉跟踪器的鲁棒性。为了提高跟踪器对遮挡的鲁棒性,我们采用了硬正变换网络(hard positive transformation network, HPTN),该网络可以生成硬样本供跟踪算法识别。我们用深度强化学习训练这个网络,用一个负的patch自动遮挡目标物体。基于生成的硬阳性样本,我们训练了一个用于视觉跟踪的Siamese网络,实验验证了所引入算法的有效性。本文的项目页面可以在网站上找到。
{"title":"SINT++: Robust Visual Tracking via Adversarial Positive Instance Generation","authors":"Xiao Wang, Chenglong Li, B. Luo, Jin Tang","doi":"10.1109/CVPR.2018.00511","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00511","url":null,"abstract":"Existing visual trackers are easily disturbed by occlusion, blur and large deformation. We think the performance of existing visual trackers may be limited due to the following issues: i) Adopting the dense sampling strategy to generate positive examples will make them less diverse; ii) The training data with different challenging factors are limited, even through collecting large training dataset. Collecting even larger training dataset is the most intuitive paradigm, but it may still can not cover all situations and the positive samples are still monotonous. In this paper, we propose to generate hard positive samples via adversarial learning for visual tracking. Specifically speaking, we assume the target objects all lie on a manifold, hence, we introduce the positive samples generation network (PSGN) to sampling massive diverse training data through traversing over the constructed target object manifold. The generated diverse target object images can enrich the training dataset and enhance the robustness of visual trackers. To make the tracker more robust to occlusion, we adopt the hard positive transformation network (HPTN) which can generate hard samples for tracking algorithm to recognize. We train this network with deep reinforcement learning to automatically occlude the target object with a negative patch. Based on the generated hard positive samples, we train a Siamese network for visual tracking and our experiments validate the effectiveness of the introduced algorithm. The project page of this paper can be found from the website1.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"5 1","pages":"4864-4873"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85314580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 109
SSNet: Scale Selection Network for Online 3D Action Prediction SSNet:用于在线三维动作预测的尺度选择网络
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00871
Jun Liu, Amir Shahroudy, G. Wang, Ling-yu Duan, A. Kot
In action prediction (early action recognition), the goal is to predict the class label of an ongoing action using its observed part so far. In this paper, we focus on online action prediction in streaming 3D skeleton sequences. A dilated convolutional network is introduced to model the motion dynamics in temporal dimension via a sliding window over the time axis. As there are significant temporal scale variations of the observed part of the ongoing action at different progress levels, we propose a novel window scale selection scheme to make our network focus on the performed part of the ongoing action and try to suppress the noise from the previous actions at each time step. Furthermore, an activation sharing scheme is proposed to deal with the overlapping computations among the adjacent steps, which allows our model to run more efficiently. The extensive experiments on two challenging datasets show the effectiveness of the proposed action prediction framework.
在动作预测(早期动作识别)中,目标是使用到目前为止观察到的部分来预测正在进行的动作的类标签。本文主要研究流三维骨架序列的在线动作预测。通过时间轴上的滑动窗口,引入扩展卷积网络在时间维度上对运动动力学进行建模。由于正在进行的动作的观察部分在不同的进度水平上存在显著的时间尺度变化,我们提出了一种新的窗口尺度选择方案,使我们的网络专注于正在进行的动作的执行部分,并试图在每个时间步抑制来自先前动作的噪声。此外,提出了一种激活共享方案来处理相邻步骤之间的重叠计算,从而提高了模型的运行效率。在两个具有挑战性的数据集上进行的大量实验表明了所提出的动作预测框架的有效性。
{"title":"SSNet: Scale Selection Network for Online 3D Action Prediction","authors":"Jun Liu, Amir Shahroudy, G. Wang, Ling-yu Duan, A. Kot","doi":"10.1109/CVPR.2018.00871","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00871","url":null,"abstract":"In action prediction (early action recognition), the goal is to predict the class label of an ongoing action using its observed part so far. In this paper, we focus on online action prediction in streaming 3D skeleton sequences. A dilated convolutional network is introduced to model the motion dynamics in temporal dimension via a sliding window over the time axis. As there are significant temporal scale variations of the observed part of the ongoing action at different progress levels, we propose a novel window scale selection scheme to make our network focus on the performed part of the ongoing action and try to suppress the noise from the previous actions at each time step. Furthermore, an activation sharing scheme is proposed to deal with the overlapping computations among the adjacent steps, which allows our model to run more efficiently. The extensive experiments on two challenging datasets show the effectiveness of the proposed action prediction framework.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"34 1","pages":"8349-8358"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85345271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
A Revised Underwater Image Formation Model 一种改进的水下图像形成模型
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00703
D. Akkaynak, T. Treibitz
The current underwater image formation model descends from atmospheric dehazing equations where attenuation is a weak function of wavelength. We recently showed that this model introduces significant errors and dependencies in the estimation of the direct transmission signal because underwater, light attenuates in a wavelength-dependent manner. Here, we show that the backscattered signal derived from the current model also suffers from dependencies that were previously unaccounted for. In doing so, we use oceanographic measurements to derive the physically valid space of backscatter, and further show that the wideband coefficients that govern backscatter are different than those that govern direct transmission, even though the current model treats them to be the same. We propose a revised equation for underwater image formation that takes these differences into account, and validate it through in situ experiments underwater. This revised model might explain frequent instabilities of current underwater color reconstruction models, and calls for the development of new methods.
目前的水下图像形成模型来源于大气除雾方程,其中衰减是波长的弱函数。我们最近表明,该模型在直接传输信号的估计中引入了显着的误差和依赖性,因为在水下,光以波长相关的方式衰减。在这里,我们表明,从当前模型导出的后向散射信号也受到以前未考虑的依赖关系的影响。在此过程中,我们使用海洋学测量来推导物理上有效的后向散射空间,并进一步表明,控制后向散射的宽带系数与控制直接传输的宽带系数不同,尽管目前的模型认为它们是相同的。我们提出了一个考虑到这些差异的水下图像形成修正方程,并通过水下原位实验对其进行了验证。修正后的模型可以解释当前水下颜色重建模型的频繁不稳定性,并呼吁开发新的方法。
{"title":"A Revised Underwater Image Formation Model","authors":"D. Akkaynak, T. Treibitz","doi":"10.1109/CVPR.2018.00703","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00703","url":null,"abstract":"The current underwater image formation model descends from atmospheric dehazing equations where attenuation is a weak function of wavelength. We recently showed that this model introduces significant errors and dependencies in the estimation of the direct transmission signal because underwater, light attenuates in a wavelength-dependent manner. Here, we show that the backscattered signal derived from the current model also suffers from dependencies that were previously unaccounted for. In doing so, we use oceanographic measurements to derive the physically valid space of backscatter, and further show that the wideband coefficients that govern backscatter are different than those that govern direct transmission, even though the current model treats them to be the same. We propose a revised equation for underwater image formation that takes these differences into account, and validate it through in situ experiments underwater. This revised model might explain frequent instabilities of current underwater color reconstruction models, and calls for the development of new methods.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"11 1","pages":"6723-6732"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86274766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 172
期刊
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1