2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition最新文献

英文中文

CNN Driven Sparse Multi-level B-Spline Image Registration CNN驱动的稀疏多级b样条图像配准

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00967

Pingge Jiang, J. Shackleford

Traditional single-grid and pyramidal B-spline parameterizations used in deformable image registration require users to specify control point spacing configurations capable of accurately capturing both global and complex local deformations. In many cases, such grid configurations are non-obvious and largely selected based on user experience. Recent regularization methods imposing sparsity upon the B-spline coefficients throughout simultaneous multi-grid optimization, however, have provided a promising means of determining suitable configurations automatically. Unfortunately, imposing sparsity on over-parameterized B-spline models is computationally expensive and introduces additional difficulties such as undesirable local minima in the B-spline coefficient optimization process. To overcome these difficulties in determining B-spline grid configurations, this paper investigates the use of convolutional neural networks (CNNs) to learn and infer expressive sparse multi-grid configurations prior to B-spline coefficient optimization. Experimental results show that multi-grid configurations produced in this fashion using our CNN based approach provide registration quality comparable to L1-norm constrained over-parameterizations in terms of exactness, while exhibiting significantly reduced computational requirements.

在可变形图像配准中使用的传统的单网格和锥体b样条参数化要求用户指定能够准确捕获全局和复杂局部变形的控制点间距配置。在许多情况下，这样的网格配置并不明显，而且很大程度上是根据用户体验选择的。然而，最近的正则化方法在同步多网格优化过程中对b样条系数施加稀疏性，为自动确定合适的配置提供了一种很有前途的方法。不幸的是，在过度参数化的b样条模型上施加稀疏性在计算上是昂贵的，并且在b样条系数优化过程中引入了额外的困难，例如不希望的局部最小值。为了克服这些确定b样条网格配置的困难，本文研究了在b样条系数优化之前，使用卷积神经网络(cnn)来学习和推断具有表现力的稀疏多网格配置。实验结果表明，使用基于CNN的方法以这种方式产生的多网格配置在准确性方面提供了与l1范数约束的过参数化相当的配准质量，同时显着降低了计算需求。

{"title":"CNN Driven Sparse Multi-level B-Spline Image Registration","authors":"Pingge Jiang, J. Shackleford","doi":"10.1109/CVPR.2018.00967","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00967","url":null,"abstract":"Traditional single-grid and pyramidal B-spline parameterizations used in deformable image registration require users to specify control point spacing configurations capable of accurately capturing both global and complex local deformations. In many cases, such grid configurations are non-obvious and largely selected based on user experience. Recent regularization methods imposing sparsity upon the B-spline coefficients throughout simultaneous multi-grid optimization, however, have provided a promising means of determining suitable configurations automatically. Unfortunately, imposing sparsity on over-parameterized B-spline models is computationally expensive and introduces additional difficulties such as undesirable local minima in the B-spline coefficient optimization process. To overcome these difficulties in determining B-spline grid configurations, this paper investigates the use of convolutional neural networks (CNNs) to learn and infer expressive sparse multi-grid configurations prior to B-spline coefficient optimization. Experimental results show that multi-grid configurations produced in this fashion using our CNN based approach provide registration quality comparable to L1-norm constrained over-parameterizations in terms of exactness, while exhibiting significantly reduced computational requirements.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"9281-9289"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85921380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

A Network Architecture for Point Cloud Classification via Automatic Depth Images Generation 基于深度图像自动生成的点云分类网络结构

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00439

Riccardo Roveri, Lukas Rahmann, C. Öztireli, M. Gross

We propose a novel neural network architecture for point cloud classification. Our key idea is to automatically transform the 3D unordered input data into a set of useful 2D depth images, and classify them by exploiting well performing image classification CNNs. We present new differentiable module designs to generate depth images from a point cloud. These modules can be combined with any network architecture for processing point clouds. We utilize them in combination with state-of-the-art classification networks, and get results competitive with the state of the art in point cloud classification. Furthermore, our architecture automatically produces informative images representing the input point cloud, which could be used for further applications such as point cloud visualization.

提出了一种新的神经网络结构用于点云分类。我们的关键思想是将3D无序输入数据自动转换为一组有用的2D深度图像，并利用性能良好的图像分类cnn对其进行分类。我们提出了一种新的可微模块设计，用于从点云生成深度图像。这些模块可以与任何网络架构相结合来处理点云。我们将它们与最先进的分类网络相结合，并获得与最先进的点云分类相竞争的结果。此外，我们的架构自动生成表示输入点云的信息图像，这可以用于点云可视化等进一步的应用。

引用次数: 56

Recognize Actions by Disentangling Components of Dynamics 通过分解动力学组件来识别动作

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00687

Yue Zhao, Yuanjun Xiong, Dahua Lin

Despite the remarkable progress in action recognition over the past several years, existing methods remain limited in efficiency and effectiveness. The methods treating appearance and motion as separate streams are usually subject to the cost of optical flow computation, while those relying on 3D convolution on the original video frames often yield inferior performance in practice. In this paper, we propose a new ConvNet architecture for video representation learning, which can derive disentangled components of dynamics purely from raw video frames, without the need of optical flow estimation. Particularly, the learned representation comprises three components for representing static appearance, apparent motion, and appearance changes. We introduce 3D pooling, cost volume processing, and warped feature differences, respectively for extracting the three components above. These modules are incorporated as three branches in our unified network, which share the underlying features and are learned jointly in an end-to-end manner. On two large datasets, UCF101 [22] and Kinetics [16], our method obtained competitive performances with high efficiency, using only the RGB frame sequence as input.

尽管过去几年在行动确认方面取得了显著进展，但现有方法在效率和效力方面仍然有限。将外观和运动作为独立流处理的方法通常会受到光流计算成本的影响，而那些依赖于原始视频帧的3D卷积的方法在实践中往往会产生较差的性能。在本文中，我们提出了一种新的用于视频表示学习的卷积神经网络架构，该架构可以完全从原始视频帧中导出解纠缠的动态分量，而不需要光流估计。特别地，学习表征包括三个组成部分，分别表示静态外观、表观运动和外观变化。我们分别引入3D池、成本体积处理和扭曲特征差异来提取上述三个组件。这些模块被合并为我们统一网络中的三个分支，它们共享底层特征，并以端到端方式共同学习。在UCF101[22]和Kinetics[16]两个大型数据集上，我们的方法仅使用RGB帧序列作为输入，就以高效率获得了具有竞争力的性能。

{"title":"Recognize Actions by Disentangling Components of Dynamics","authors":"Yue Zhao, Yuanjun Xiong, Dahua Lin","doi":"10.1109/CVPR.2018.00687","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00687","url":null,"abstract":"Despite the remarkable progress in action recognition over the past several years, existing methods remain limited in efficiency and effectiveness. The methods treating appearance and motion as separate streams are usually subject to the cost of optical flow computation, while those relying on 3D convolution on the original video frames often yield inferior performance in practice. In this paper, we propose a new ConvNet architecture for video representation learning, which can derive disentangled components of dynamics purely from raw video frames, without the need of optical flow estimation. Particularly, the learned representation comprises three components for representing static appearance, apparent motion, and appearance changes. We introduce 3D pooling, cost volume processing, and warped feature differences, respectively for extracting the three components above. These modules are incorporated as three branches in our unified network, which share the underlying features and are learned jointly in an end-to-end manner. On two large datasets, UCF101 [22] and Kinetics [16], our method obtained competitive performances with high efficiency, using only the RGB frame sequence as input.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"27 1","pages":"6566-6575"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83585988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

Learning to Evaluate Image Captioning 学习评估图像标题

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00608

Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, Serge J. Belongie

Evaluation metrics for image captioning face two challenges. Firstly, commonly used metrics such as CIDEr, METEOR, ROUGE and BLEU often do not correlate well with human judgments. Secondly, each metric has well known blind spots to pathological caption constructions, and rule-based metrics lack provisions to repair such blind spots once identified. For example, the newly proposed SPICE correlates well with human judgments, but fails to capture the syntactic structure of a sentence. To address these two challenges, we propose a novel learning based discriminative evaluation metric that is directly trained to distinguish between human and machine-generated captions. In addition, we further propose a data augmentation scheme to explicitly incorporate pathological transformations as negative examples during training. The proposed metric is evaluated with three kinds of robustness tests and its correlation with human judgments. Extensive experiments show that the proposed data augmentation scheme not only makes our metric more robust toward several pathological transformations, but also improves its correlation with human judgments. Our metric outperforms other metrics on both caption level human correlation in Flickr 8k and system level human correlation in COCO. The proposed approach could be served as a learning based evaluation metric that is complementary to existing rule-based metrics.

图像字幕的评价指标面临两个挑战。首先，常用的指标如CIDEr、METEOR、ROUGE和BLEU往往与人类的判断不太相关。其次，每个指标都存在病态标题构建的盲点，而基于规则的指标一旦发现盲点，就缺乏修复这些盲点的规定。例如，新提出的SPICE可以很好地与人类判断相关联，但无法捕捉句子的句法结构。为了解决这两个挑战，我们提出了一种新的基于学习的判别评估指标，该指标可以直接训练来区分人类和机器生成的字幕。此外，我们进一步提出了一种数据增强方案，明确地将病态转化作为训练中的负例。用三种鲁棒性检验及其与人类判断的相关性来评价所提出的度量。大量的实验表明，所提出的数据增强方案不仅使我们的指标对几种病理转换更具鲁棒性，而且还提高了其与人类判断的相关性。我们的指标在Flickr 8k中的标题级人类相关性和COCO中的系统级人类相关性上都优于其他指标。所提出的方法可以作为一种基于学习的评估度量，补充现有的基于规则的度量。

{"title":"Learning to Evaluate Image Captioning","authors":"Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, Serge J. Belongie","doi":"10.1109/CVPR.2018.00608","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00608","url":null,"abstract":"Evaluation metrics for image captioning face two challenges. Firstly, commonly used metrics such as CIDEr, METEOR, ROUGE and BLEU often do not correlate well with human judgments. Secondly, each metric has well known blind spots to pathological caption constructions, and rule-based metrics lack provisions to repair such blind spots once identified. For example, the newly proposed SPICE correlates well with human judgments, but fails to capture the syntactic structure of a sentence. To address these two challenges, we propose a novel learning based discriminative evaluation metric that is directly trained to distinguish between human and machine-generated captions. In addition, we further propose a data augmentation scheme to explicitly incorporate pathological transformations as negative examples during training. The proposed metric is evaluated with three kinds of robustness tests and its correlation with human judgments. Extensive experiments show that the proposed data augmentation scheme not only makes our metric more robust toward several pathological transformations, but also improves its correlation with human judgments. Our metric outperforms other metrics on both caption level human correlation in Flickr 8k and system level human correlation in COCO. The proposed approach could be served as a learning based evaluation metric that is complementary to existing rule-based metrics.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"87 1","pages":"5804-5812"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75987568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 113

High-Speed Tracking with Multi-kernel Correlation Filters 基于多核相关滤波器的高速跟踪

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00512

Ming Tang, Bin Yu, Fan Zhang, Jinqiao Wang

Correlation filter (CF) based trackers are currently ranked top in terms of their performances. Nevertheless, only some of them, such as KCF [26] and MKCF [48], are able to exploit the powerful discriminability of non-linear kernels. Although MKCF achieves more powerful discriminability than KCF through introducing multi-kernel learning (MKL) into KCF, its improvement over KCF is quite limited and its computational burden increases significantly in comparison with KCF. In this paper, we will introduce the MKL into KCF in a different way than MKCF. We reformulate the MKL version of CF objective function with its upper bound, alleviating the negative mutual interference of different kernels significantly. Our novel MKCF tracker, MKCFup, outperforms KCF and MKCF with large margins and can still work at very high fps. Extensive experiments on public data sets show that our method is superior to state-of-the-art algorithms for target objects of small move at very high speed.

基于相关滤波器(CF)的跟踪器目前在性能方面排名靠前。然而，只有KCF[26]和MKCF[48]等部分算法能够利用非线性核的强大可判别性。虽然MKCF通过在KCF中引入多核学习(multikernel learning, MKL)实现了比KCF更强大的可判别性，但其对KCF的改进非常有限，计算量也比KCF显著增加。在本文中，我们将以不同于MKCF的方式将MKL引入KCF。我们重新构造了具有上界的CF目标函数的MKL版本，显著减轻了不同核之间的负相互干扰。我们新颖的MKCF跟踪器MKCFup，在很大的余量上优于KCF和MKCF，并且仍然可以在非常高的fps下工作。在公共数据集上进行的大量实验表明，我们的方法优于目前最先进的算法，可以在非常高的速度下实现小运动目标。

引用次数: 76

Kernelized Subspace Pooling for Deep Local Descriptors 深度局部描述符的核化子空间池化

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00200

Xing Wei, Yue Zhang, Yihong Gong, N. Zheng

Representing local image patches in an invariant and discriminative manner is an active research topic in computer vision. It has recently been demonstrated that local feature learning based on deep Convolutional Neural Network (CNN) can significantly improve the matching performance. Previous works on learning such descriptors have focused on developing various loss functions, regularizations and data mining strategies to learn discriminative CNN representations. Such methods, however, have little analysis on how to increase geometric invariance of their generated descriptors. In this paper, we propose a descriptor that has both highly invariant and discriminative power. The abilities come from a novel pooling method, dubbed Subspace Pooling (SP) which is invariant to a range of geometric deformations. To further increase the discriminative power of our descriptor, we propose a simple distance kernel integrated to the marginal triplet loss that helps to focus on hard examples in CNN training. Finally, we show that by combining SP with the projection distance metric [13], the generated feature descriptor is equivalent to that of the Bilinear CNN model [22], but outperforms the latter with much lower memory and computation consumptions. The proposed method is simple, easy to understand and achieves good performance. Experimental results on several patch matching benchmarks show that our method outperforms the state-of-the-arts significantly.

以不变和判别的方式表示局部图像斑块是计算机视觉领域的一个活跃研究课题。最近有研究表明，基于深度卷积神经网络(CNN)的局部特征学习可以显著提高匹配性能。以前关于学习这种描述符的工作主要集中在开发各种损失函数、正则化和数据挖掘策略来学习判别CNN表示。然而，这些方法很少分析如何提高其生成的描述符的几何不变性。在本文中，我们提出了一个同时具有高度不变性和判别能力的描述符。这种能力来自一种新的池化方法，称为子空间池化(SP)，它对一系列几何变形是不变的。为了进一步提高描述符的判别能力，我们提出了一个简单的与边际三重损失集成的距离核，这有助于专注于CNN训练中的难示例。最后，我们表明，通过将SP与投影距离度量[13]结合，生成的特征描述符相当于Bilinear CNN模型[22]，但性能优于后者，内存和计算消耗要低得多。该方法简单易懂，性能良好。在几个补丁匹配基准测试上的实验结果表明，我们的方法明显优于目前最先进的方法。

{"title":"Kernelized Subspace Pooling for Deep Local Descriptors","authors":"Xing Wei, Yue Zhang, Yihong Gong, N. Zheng","doi":"10.1109/CVPR.2018.00200","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00200","url":null,"abstract":"Representing local image patches in an invariant and discriminative manner is an active research topic in computer vision. It has recently been demonstrated that local feature learning based on deep Convolutional Neural Network (CNN) can significantly improve the matching performance. Previous works on learning such descriptors have focused on developing various loss functions, regularizations and data mining strategies to learn discriminative CNN representations. Such methods, however, have little analysis on how to increase geometric invariance of their generated descriptors. In this paper, we propose a descriptor that has both highly invariant and discriminative power. The abilities come from a novel pooling method, dubbed Subspace Pooling (SP) which is invariant to a range of geometric deformations. To further increase the discriminative power of our descriptor, we propose a simple distance kernel integrated to the marginal triplet loss that helps to focus on hard examples in CNN training. Finally, we show that by combining SP with the projection distance metric [13], the generated feature descriptor is equivalent to that of the Bilinear CNN model [22], but outperforms the latter with much lower memory and computation consumptions. The proposed method is simple, easy to understand and achieves good performance. Experimental results on several patch matching benchmarks show that our method outperforms the state-of-the-arts significantly.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"11 1","pages":"1867-1875"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88614705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Deep Adversarial Subspace Clustering 深度对抗性子空间聚类

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00172

Pan Zhou, Yunqing Hou, Jiashi Feng

Most existing subspace clustering methods hinge on self-expression of handcrafted representations and are unaware of potential clustering errors. Thus they perform unsatisfactorily on real data with complex underlying subspaces. To solve this issue, we propose a novel deep adversarial subspace clustering (DASC) model, which learns more favorable sample representations by deep learning for subspace clustering, and more importantly introduces adversarial learning to supervise sample representation learning and subspace clustering. Specifically, DASC consists of a subspace clustering generator and a quality-verifying discriminator, which learn against each other. The generator produces subspace estimation and sample clustering. The discriminator evaluates current clustering performance by inspecting whether the re-sampled data from estimated subspaces have consistent subspace properties, and supervises the generator to progressively improve subspace clustering. Experimental results on the handwritten recognition, face and object clustering tasks demonstrate the advantages of DASC over shallow and few deep subspace clustering models. Moreover, to our best knowledge, this is the first successful application of GAN-alike model for unsupervised subspace clustering, which also paves the way for deep learning to solve other unsupervised learning problems.

大多数现有的子空间聚类方法依赖于手工表示的自我表达，并且没有意识到潜在的聚类错误。因此，它们在具有复杂子空间的实际数据上的表现并不令人满意。为了解决这个问题，我们提出了一种新的深度对抗性子空间聚类(DASC)模型，该模型通过对子空间聚类的深度学习来学习更有利的样本表示，更重要的是引入对抗性学习来监督样本表示学习和子空间聚类。具体来说，DASC由子空间聚类生成器和质量验证鉴别器组成，它们相互学习。该生成器产生子空间估计和样本聚类。鉴别器通过检查来自估计子空间的重采样数据是否具有一致的子空间属性来评估当前的聚类性能，并监督生成器逐步改进子空间聚类。在手写体识别、人脸和目标聚类任务上的实验结果表明，DASC优于浅子空间聚类模型和少数深子空间聚类模型。此外，据我们所知，这是gan类模型首次成功应用于无监督子空间聚类，这也为深度学习解决其他无监督学习问题铺平了道路。

{"title":"Deep Adversarial Subspace Clustering","authors":"Pan Zhou, Yunqing Hou, Jiashi Feng","doi":"10.1109/CVPR.2018.00172","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00172","url":null,"abstract":"Most existing subspace clustering methods hinge on self-expression of handcrafted representations and are unaware of potential clustering errors. Thus they perform unsatisfactorily on real data with complex underlying subspaces. To solve this issue, we propose a novel deep adversarial subspace clustering (DASC) model, which learns more favorable sample representations by deep learning for subspace clustering, and more importantly introduces adversarial learning to supervise sample representation learning and subspace clustering. Specifically, DASC consists of a subspace clustering generator and a quality-verifying discriminator, which learn against each other. The generator produces subspace estimation and sample clustering. The discriminator evaluates current clustering performance by inspecting whether the re-sampled data from estimated subspaces have consistent subspace properties, and supervises the generator to progressively improve subspace clustering. Experimental results on the handwritten recognition, face and object clustering tasks demonstrate the advantages of DASC over shallow and few deep subspace clustering models. Moreover, to our best knowledge, this is the first successful application of GAN-alike model for unsupervised subspace clustering, which also paves the way for deep learning to solve other unsupervised learning problems.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"88 1","pages":"1596-1604"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77325550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 142

Seeing Temporal Modulation of Lights from Standard Cameras 从标准相机看光的时间调制

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00670

Naoki Sakakibara, Fumihiko Sakaue, J. Sato

In this paper, we propose a novel method for measuring the temporal modulation of lights by using off-the-shelf cameras. In particular, we show that the invisible flicker patterns of various lights such as fluorescent lights can be measured by a simple combination of an off-the-shelf camera and any moving object with specular reflection. Unlike the existing methods, we do not need high speed cameras nor specially designed coded exposure cameras. Based on the extracted flicker patterns of environment lights, we also propose an efficient method for deblurring motion blurs in images. The proposed method enables us to deblur images with better frequency characteristics, which are induced by the flicker patterns of environment lights. The real image experiments show the efficiency of the proposed method.

在本文中，我们提出了一种利用现成的相机测量光的时间调制的新方法。特别是，我们展示了各种光(如荧光灯)的不可见闪烁模式可以通过一个现成的相机和任何具有镜面反射的运动物体的简单组合来测量。与现有的方法不同，我们不需要高速相机，也不需要专门设计的编码曝光相机。在提取环境光闪烁模式的基础上，提出了一种消除图像运动模糊的有效方法。该方法能使环境光的闪烁模式所引起的图像具有较好的频率特性。实景图像实验证明了该方法的有效性。

引用次数: 3

Visual Grounding via Accumulated Attention 通过注意力积累的视觉基础

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00808

Chaorui Deng, Qi Wu, Qingyao Wu, Fuyuan Hu, Fan Lyu, Mingkui Tan

Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence or even a multi-round dialogue. There are three main challenges in VG: 1) what is the main focus in a query; 2) how to understand an image; 3) how to locate an object. Most existing methods combine all the information curtly, which may suffer from the problem of information redundancy (i.e. ambiguous query, complicated image and a large number of objects). In this paper, we formulate these challenges as three attention problems and propose an accumulated attention (A-ATT) mechanism to reason among them jointly. Our A-ATT mechanism can circularly accumulate the attention for useful information in image, query, and objects, while the noises are ignored gradually. We evaluate the performance of A-ATT on four popular datasets (namely Refer-COCO, ReferCOCO+, ReferCOCOg, and Guesswhat?!), and the experimental results show the superiority of the proposed method in term of accuracy.

基于自然语言查询的视觉定位(VG)旨在定位图像中最相关的对象或区域。查询可以是一个短语、一个句子，甚至是一个多轮对话。在VG中有三个主要的挑战:1)查询的主要焦点是什么;2)如何理解图像;3)如何定位目标。现有的方法大多是将所有信息简单地组合在一起，存在信息冗余的问题(如查询不明确、图像复杂、对象多)。在本文中，我们将这些挑战归纳为三个关注问题，并提出了一个累积关注(A-ATT)机制来共同对它们进行推理。我们的A-ATT机制可以循环地积累对图像、查询和对象中有用信息的关注，而逐渐忽略噪声。我们在四个流行的数据集(即ReferCOCO - coco、ReferCOCO+、ReferCOCO和Guesswhat?!)上评估了A-ATT的性能，实验结果表明了所提方法在准确率方面的优越性。

引用次数: 173

Image Blind Denoising with Generative Adversarial Network Based Noise Modeling 基于生成对抗网络的噪声建模图像盲去噪

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00333

Jingwen Chen, Jiawei Chen, Hongyang Chao, Ming Yang

In this paper, we consider a typical image blind denoising problem, which is to remove unknown noise from noisy images. As we all know, discriminative learning based methods, such as DnCNN, can achieve state-of-the-art denoising results, but they are not applicable to this problem due to the lack of paired training data. To tackle the barrier, we propose a novel two-step framework. First, a Generative Adversarial Network (GAN) is trained to estimate the noise distribution over the input noisy images and to generate noise samples. Second, the noise patches sampled from the first step are utilized to construct a paired training dataset, which is used, in turn, to train a deep Convolutional Neural Network (CNN) for denoising. Extensive experiments have been done to demonstrate the superiority of our approach in image blind denoising.

本文研究了一种典型的图像盲去噪问题，即从带有噪声的图像中去除未知噪声。我们都知道，基于判别学习的方法，如DnCNN，可以得到最先进的去噪结果，但由于缺乏配对训练数据，它们并不适用于这个问题。为了解决这个障碍，我们提出了一个新的两步框架。首先，训练生成对抗网络(GAN)来估计输入噪声图像上的噪声分布并生成噪声样本。其次，利用从第一步采样的噪声块构建成对训练数据集，该数据集反过来用于训练深度卷积神经网络(CNN)进行去噪。大量的实验证明了该方法在图像盲去噪方面的优越性。

引用次数: 430

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀