2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition最新文献

英文中文

A Fast Resection-Intersection Method for the Known Rotation Problem 已知旋转问题的快速剖交法

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00318

The known rotation problem refers to a special case of structure-from-motion where the absolute orientations of the cameras are known. When formulated as a minimax ($$) problem on reprojection errors, the problem is an instance of pseudo-convex programming. Though theoretically tractable, solving the known rotation problem on large-scale data (1,000's of views, 10,000's scene points) using existing methods can be very time-consuming. In this paper, we devise a fast algorithm for the known rotation problem. Our approach alternates between pose estimation and triangulation (i.e., resection-intersection) to break the problem into multiple simpler instances of pseudo-convex programming. The key to the vastly superior performance of our method lies in using a novel minimum enclosing ball (MEB) technique for the calculation of updating steps, which obviates the need for convex optimisation routines and greatly reduces memory footprint. We demonstrate the practicality of our method on large-scale problem instances which easily overwhelm current state-of-the-art algorithms.

已知旋转问题是指一种特殊情况的结构从运动，其中绝对方向的相机是已知的。当将其表述为关于重投影误差的极大极小问题时，该问题就是伪凸规划的一个实例。虽然理论上可以处理，但使用现有方法解决大规模数据(1,000个视图，10,000个场景点)上已知的旋转问题可能非常耗时。本文针对已知旋转问题设计了一种快速算法。我们的方法在姿态估计和三角剖分(即，分割-相交)之间交替进行，将问题分解为多个更简单的伪凸规划实例。我们的方法性能优越的关键在于使用了一种新颖的最小封闭球(MEB)技术来计算更新步骤，这避免了对凸优化例程的需要，并大大减少了内存占用。我们证明了我们的方法在大规模问题实例上的实用性，这些实例很容易压倒当前最先进的算法。

引用次数: 10

Multi-level Fusion Based 3D Object Detection from Monocular Images 基于多层次融合的单目图像三维目标检测

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00249

Bin Xu Zhenzhong Chen

In this paper, we present an end-to-end multi-level fusion based framework for 3D object detection from a single monocular image. The whole network is composed of two parts: one for 2D region proposal generation and another for simultaneously predictions of objects' 2D locations, orientations, dimensions, and 3D locations. With the help of a stand-alone module to estimate the disparity and compute the 3D point cloud, we introduce the multi-level fusion scheme. First, we encode the disparity information with a front view feature representation and fuse it with the RGB image to enhance the input. Second, features extracted from the original input and the point cloud are combined to boost the object detection. For 3D localization, we introduce an extra stream to predict the location information from point cloud directly and add it to the aforementioned location prediction. The proposed algorithm can directly output both 2D and 3D object detection results in an end-to-end fashion with only a single RGB image as the input. The experimental results on the challenging KITTI benchmark demonstrate that our algorithm significantly outperforms monocular state-of-the-art methods.

在本文中，我们提出了一个基于端到端多级融合的框架，用于从单眼图像中检测3D目标。整个网络由两部分组成:一部分用于二维区域建议生成，另一部分用于同时预测物体的二维位置、方向、尺寸和三维位置。利用独立的视差估计模块和三维点云计算模块，引入了多层次融合方案。首先，我们用前视特征表示编码视差信息，并将其与RGB图像融合以增强输入;其次，将从原始输入中提取的特征与点云相结合，增强目标检测能力;对于3D定位，我们引入了一个额外的流来直接预测点云的位置信息，并将其添加到前面的位置预测中。该算法只需要一张RGB图像作为输入，就可以以端到端的方式直接输出2D和3D目标检测结果。在具有挑战性的KITTI基准上的实验结果表明，我们的算法明显优于单目最先进的方法。

{"title":"Multi-level Fusion Based 3D Object Detection from Monocular Images","authors":"Bin Xu, Zhenzhong Chen","doi":"10.1109/CVPR.2018.00249","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00249","url":null,"abstract":"In this paper, we present an end-to-end multi-level fusion based framework for 3D object detection from a single monocular image. The whole network is composed of two parts: one for 2D region proposal generation and another for simultaneously predictions of objects' 2D locations, orientations, dimensions, and 3D locations. With the help of a stand-alone module to estimate the disparity and compute the 3D point cloud, we introduce the multi-level fusion scheme. First, we encode the disparity information with a front view feature representation and fuse it with the RGB image to enhance the input. Second, features extracted from the original input and the point cloud are combined to boost the object detection. For 3D localization, we introduce an extra stream to predict the location information from point cloud directly and add it to the aforementioned location prediction. The proposed algorithm can directly output both 2D and 3D object detection results in an end-to-end fashion with only a single RGB image as the input. The experimental results on the challenging KITTI benchmark demonstrate that our algorithm significantly outperforms monocular state-of-the-art methods.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"30 1","pages":"2345-2353"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84166953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 260

Neural Style Transfer via Meta Networks 通过元网络的神经风格迁移

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00841

Falong Shen Shuicheng Yan Gang Zeng

In this paper we propose a noval method to generate the specified network parameters through one feed-forward propagation in the meta networks for neural style transfer. Recent works on style transfer typically need to train image transformation networks for every new style, and the style is encoded in the network parameters by enormous iterations of stochastic gradient descent, which lacks the generalization ability to new style in the inference stage. To tackle these issues, we build a meta network which takes in the style image and generates a corresponding image transformation network directly. Compared with optimization-based methods for every style, our meta networks can handle an arbitrary new style within 19 milliseconds on one modern GPU card. The fast image transformation network generated by our meta network is only 449 KB, which is capable of real-time running on a mobile device. We also investigate the manifold of the style transfer networks by operating the hidden features from meta networks. Experiments have well validated the effectiveness of our method. Code and trained models will be released.

本文提出了一种在神经风格迁移的元网络中通过一次前馈传播生成指定网络参数的新方法。目前关于风格迁移的研究通常需要针对每一种新风格训练图像变换网络，并且风格是通过大量的随机梯度下降迭代编码到网络参数中，在推理阶段缺乏对新风格的泛化能力。为了解决这些问题，我们构建了一个元网络，该网络直接接收风格图像并生成相应的图像转换网络。与针对每种风格的基于优化的方法相比，我们的元网络可以在一个现代GPU卡上在19毫秒内处理任意新风格。我们的元网络生成的快速图像变换网络只有449 KB，能够在移动设备上实时运行。我们还通过操作元网络中的隐藏特征来研究风格迁移网络的流形。实验验证了该方法的有效性。代码和训练过的模型将被发布。

引用次数: 105

Robust Hough Transform Based 3D Reconstruction from Circular Light Fields 基于鲁棒霍夫变换的圆形光场三维重建

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00765

A. Vianello B. Jähne J. Ackermann M. Diebold

Light-field imaging is based on images taken on a regular grid. Thus, high-quality 3D reconstructions are obtainable by analyzing orientations in epipolar plane images (EPIs). Unfortunately, such data only allows to evaluate one side of the object. Moreover, a constant intensity along each orientation is mandatory for most of the approaches. This paper presents a novel method which allows to reconstruct depth information from data acquired with a circular camera motion, termed circular light fields. With this approach it is possible to determine the full 360Â° view of target objects. Additionally, circular light fields allow retrieving depth from datasets acquired with telecentric lenses, which is not possible with linear light fields. The proposed method finds trajectories of 3D points in the EPIs by means of a modified Hough transform. For this purpose, binary EPI-edge images are used, which not only allow to obtain reliable depth information, but also overcome the limitation of constant intensity along trajectories. Experimental results on synthetic and real datasets demonstrate the quality of the proposed algorithm.

光场成像是基于在规则网格上拍摄的图像。因此，通过分析极平面图像(EPIs)的方向，可以获得高质量的三维重建。不幸的是，这样的数据只允许计算对象的一面。此外，对于大多数方法来说，每个方向的恒定强度是强制性的。本文提出了一种新的方法，该方法可以从圆形相机运动获得的数据中重建深度信息，称为圆形光场。通过这种方法，可以确定目标物体的完整360Â°视图。此外，圆形光场允许从远心透镜获得的数据集中检索深度，这是线性光场无法实现的。该方法利用改进的Hough变换找到EPIs中三维点的轨迹。为此，使用二值epi边缘图像，不仅可以获得可靠的深度信息，而且可以克服沿轨迹恒定强度的限制。在合成数据集和真实数据集上的实验结果证明了该算法的有效性。

{"title":"Robust Hough Transform Based 3D Reconstruction from Circular Light Fields","authors":"A. Vianello, J. Ackermann, M. Diebold, B. Jähne","doi":"10.1109/CVPR.2018.00765","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00765","url":null,"abstract":"Light-field imaging is based on images taken on a regular grid. Thus, high-quality 3D reconstructions are obtainable by analyzing orientations in epipolar plane images (EPIs). Unfortunately, such data only allows to evaluate one side of the object. Moreover, a constant intensity along each orientation is mandatory for most of the approaches. This paper presents a novel method which allows to reconstruct depth information from data acquired with a circular camera motion, termed circular light fields. With this approach it is possible to determine the full 360Â° view of target objects. Additionally, circular light fields allow retrieving depth from datasets acquired with telecentric lenses, which is not possible with linear light fields. The proposed method finds trajectories of 3D points in the EPIs by means of a modified Hough transform. For this purpose, binary EPI-edge images are used, which not only allow to obtain reliable depth information, but also overcome the limitation of constant intensity along trajectories. Experimental results on synthetic and real datasets demonstrate the quality of the proposed algorithm.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"105 1","pages":"7327-7335"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80943086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Augmenting Crowd-Sourced 3D Reconstructions Using Semantic Detections 使用语义检测增强众包3D重建

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00206

M. Pollefeys Johannes L. Schönberger True Price Jan-Michael Frahm Zhen Wei

Image-based 3D reconstruction for Internet photo collections has become a robust technology to produce impressive virtual representations of real-world scenes. However, several fundamental challenges remain for Structure-from-Motion (SfM) pipelines, namely: the placement and reconstruction of transient objects only observed in single views, estimating the absolute scale of the scene, and (suprisingly often) recovering ground surfaces in the scene. We propose a method to jointly address these remaining open problems of SfM. In particular, we focus on detecting people in individual images and accurately placing them into an existing 3D model. As part of this placement, our method also estimates the absolute scale of the scene from object semantics, which in this case constitutes the height distribution of the population. Further, we obtain a smooth approximation of the ground surface and recover the gravity vector of the scene directly from the individual person detections. We demonstrate the results of our approach on a number of unordered Internet photo collections, and we quantitatively evaluate the obtained absolute scene scales.

基于图像的3D网络照片重建已经成为一项强大的技术，可以产生令人印象深刻的虚拟现实场景。然而，对于结构-从运动(SfM)管道来说，仍然存在几个基本的挑战，即:仅在单个视图中观察到的瞬态物体的放置和重建，估计场景的绝对规模，以及(令人惊讶的是)恢复场景中的地面。我们提出了一种方法来共同解决SfM中这些悬而未决的问题。特别是，我们专注于检测单个图像中的人，并准确地将他们放入现有的3D模型中。作为放置的一部分，我们的方法还从对象语义中估计场景的绝对规模，在这种情况下，它构成了人口的高度分布。此外，我们获得了地面的光滑近似，并直接从个体检测中恢复场景的重力矢量。我们在一些无序的互联网照片集中展示了我们的方法的结果，我们定量地评估了获得的绝对场景尺度。

{"title":"Augmenting Crowd-Sourced 3D Reconstructions Using Semantic Detections","authors":"True Price, Johannes L. Schönberger, Zhen Wei, M. Pollefeys, Jan-Michael Frahm","doi":"10.1109/CVPR.2018.00206","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00206","url":null,"abstract":"Image-based 3D reconstruction for Internet photo collections has become a robust technology to produce impressive virtual representations of real-world scenes. However, several fundamental challenges remain for Structure-from-Motion (SfM) pipelines, namely: the placement and reconstruction of transient objects only observed in single views, estimating the absolute scale of the scene, and (suprisingly often) recovering ground surfaces in the scene. We propose a method to jointly address these remaining open problems of SfM. In particular, we focus on detecting people in individual images and accurately placing them into an existing 3D model. As part of this placement, our method also estimates the absolute scale of the scene from object semantics, which in this case constitutes the height distribution of the population. Further, we obtain a smooth approximation of the ground surface and recover the gravity vector of the scene directly from the individual person detections. We demonstrate the results of our approach on a number of unordered Internet photo collections, and we quantitatively evaluate the obtained absolute scene scales.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"96 1","pages":"1926-1935"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80971681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Divide and Conquer for Full-Resolution Light Field Deblurring 分而治之的全分辨率光场去模糊

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00672

M. Mohan A. Rajagopalan

The increasing popularity of computational light field (LF) cameras has necessitated the need for tackling motion blur which is a ubiquitous phenomenon in hand-held photography. The state-of-the-art method for blind deblurring of LFs of general 3D scenes is limited to handling only downsampled LF, both in spatial and angular resolution. This is due to the computational overhead involved in processing data-hungry full-resolution 4D LF altogether. Moreover, the method warrants high-end GPUs for optimization and is ineffective for wide-angle settings and irregular camera motion. In this paper, we introduce a new blind motion deblurring strategy for LFs which alleviates these limitations significantly. Our model achieves this by isolating 4D LF motion blur across the 2D subaperture images, thus paving the way for independent deblurring of these subaperture images. Furthermore, our model accommodates common camera motion parameterization across the subaperture images. Consequently, blind deblurring of any single subaperture image elegantly paves the way for cost-effective non-blind deblurring of the other subaperture images. Our approach is CPU-efficient computationally and can effectively deblur full-resolution LFs.

随着计算光场相机(LF)的日益普及，需要解决运动模糊问题，这是手持摄影中普遍存在的现象。对于一般3D场景的LF，最先进的盲去模糊方法仅限于处理下采样的LF，无论是空间分辨率还是角度分辨率。这是由于处理数据饥渴的全分辨率4D LF所涉及的计算开销。此外，该方法需要高端gpu进行优化，但对于广角设置和不规则相机运动无效。在本文中，我们引入了一种新的LFs盲运动去模糊策略，该策略显著地缓解了这些限制。我们的模型通过隔离2D子孔径图像中的4D LF运动模糊来实现这一点，从而为这些子孔径图像的独立去模糊铺平了道路。此外，我们的模型适应了跨子孔径图像的常见相机运动参数化。因此，任何单个子孔径图像的盲去模糊都为其他子孔径图像的经济有效的非盲去模糊铺平了道路。我们的方法在计算上是cpu高效的，并且可以有效地消除全分辨率LFs的模糊。

{"title":"Divide and Conquer for Full-Resolution Light Field Deblurring","authors":"M. Mohan, A. Rajagopalan","doi":"10.1109/CVPR.2018.00672","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00672","url":null,"abstract":"The increasing popularity of computational light field (LF) cameras has necessitated the need for tackling motion blur which is a ubiquitous phenomenon in hand-held photography. The state-of-the-art method for blind deblurring of LFs of general 3D scenes is limited to handling only downsampled LF, both in spatial and angular resolution. This is due to the computational overhead involved in processing data-hungry full-resolution 4D LF altogether. Moreover, the method warrants high-end GPUs for optimization and is ineffective for wide-angle settings and irregular camera motion. In this paper, we introduce a new blind motion deblurring strategy for LFs which alleviates these limitations significantly. Our model achieves this by isolating 4D LF motion blur across the 2D subaperture images, thus paving the way for independent deblurring of these subaperture images. Furthermore, our model accommodates common camera motion parameterization across the subaperture images. Consequently, blind deblurring of any single subaperture image elegantly paves the way for cost-effective non-blind deblurring of the other subaperture images. Our approach is CPU-efficient computationally and can effectively deblur full-resolution LFs.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"30 1","pages":"6421-6429"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90308082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

A Common Framework for Interactive Texture Transfer 交互式纹理传输的通用框架

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00665

Z. Lian Yingmin Tang Jianguo Xiao Yifang Men

In this paper, we present a general-purpose solution to interactive texture transfer problems that better preserves both local structure and visual richness. It is challenging due to the diversity of tasks and the simplicity of required user guidance. The core idea of our common framework is to use multiple custom channels to dynamically guide the synthesis process. For interactivity, users can control the spatial distribution of stylized textures via semantic channels. The structure guidance, acquired by two stages of automatic extraction and propagation of structure information, provides a prior for initialization and preserves the salient structure by searching the nearest neighbor fields (NNF) with structure coherence. Meanwhile, texture coherence is also exploited to maintain similar style with the source image. In addition, we leverage an improved PatchMatch with extended NNF and matrix operations to obtain transformable source patches with richer geometric information at high speed. We demonstrate the effectiveness and superiority of our method on a variety of scenes through extensive comparisons with state-of-the-art algorithms.

在本文中，我们提出了一个通用的解决方案，以更好地保留局部结构和视觉丰富性的交互式纹理传输问题。由于任务的多样性和所需用户指导的简单性，这是具有挑战性的。我们的通用框架的核心思想是使用多个自定义通道来动态地指导合成过程。在交互性方面，用户可以通过语义通道控制风格化纹理的空间分布。通过结构信息的自动提取和传播两阶段获得结构引导，为初始化提供先验，并通过搜索具有结构相干性的最近邻域(NNF)来保留显著结构。同时，利用纹理一致性来保持与源图像的相似风格。此外，我们利用改进的PatchMatch扩展了NNF和矩阵运算，以高速获得具有更丰富几何信息的可转换源补丁。通过与最先进的算法进行广泛的比较，我们证明了我们的方法在各种场景中的有效性和优越性。

{"title":"A Common Framework for Interactive Texture Transfer","authors":"Yifang Men, Z. Lian, Yingmin Tang, Jianguo Xiao","doi":"10.1109/CVPR.2018.00665","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00665","url":null,"abstract":"In this paper, we present a general-purpose solution to interactive texture transfer problems that better preserves both local structure and visual richness. It is challenging due to the diversity of tasks and the simplicity of required user guidance. The core idea of our common framework is to use multiple custom channels to dynamically guide the synthesis process. For interactivity, users can control the spatial distribution of stylized textures via semantic channels. The structure guidance, acquired by two stages of automatic extraction and propagation of structure information, provides a prior for initialization and preserves the salient structure by searching the nearest neighbor fields (NNF) with structure coherence. Meanwhile, texture coherence is also exploited to maintain similar style with the source image. In addition, we leverage an improved PatchMatch with extended NNF and matrix operations to obtain transformable source patches with richer geometric information at high speed. We demonstrate the effectiveness and superiority of our method on a variety of scenes through extensive comparisons with state-of-the-art algorithms.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"53 1","pages":"6353-6362"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90954518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Local and Global Optimization Techniques in Graph-Based Clustering 基于图的聚类中的局部和全局优化技术

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00364

K. Aizawa Daiki Ikami T. Yamasaki

The goal of graph-based clustering is to divide a dataset into disjoint subsets with members similar to each other from an affinity (similarity) matrix between data. The most popular method of solving graph-based clustering is spectral clustering. However, spectral clustering has drawbacks. Spectral clustering can only be applied to macroaverage-based cost functions, which tend to generate undesirable small clusters. This study first introduces a novel cost function based on micro-average. We propose a local optimization method, which is widely applicable to graph-based clustering cost functions. We also propose an initial-guess-free algorithm to avoid its initialization dependency. Moreover, we present two global optimization techniques. The experimental results exhibit significant clustering performances from our proposed methods, including 100% clustering accuracy in the COIL-20 dataset.

基于图的聚类的目标是根据数据之间的亲和力(相似度)矩阵将数据集划分为不相交的子集，这些子集的成员彼此相似。解决基于图的聚类最常用的方法是谱聚类。然而，光谱聚类也有缺点。谱聚类只能应用于基于宏平均的代价函数，这往往会产生不希望的小聚类。本研究首先引入了一种基于微平均的成本函数。我们提出了一种局部优化方法，该方法广泛适用于基于图的聚类代价函数。我们还提出了一种无需初始猜测的算法，以避免其初始化依赖。此外，我们还提出了两种全局优化技术。实验结果表明，本文提出的方法具有显著的聚类性能，在COIL-20数据集上的聚类准确率达到100%。

引用次数: 5

Two-Step Quantization for Low-bit Neural Networks 低比特神经网络的两步量化

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00460

Chunjie Zhang Qinghao Hu Peisong Wang Yifan Zhang Jian Cheng Yang Liu

Every bit matters in the hardware design of quantized neural networks. However, extremely-low-bit representation usually causes large accuracy drop. Thus, how to train extremely-low-bit neural networks with high accuracy is of central importance. Most existing network quantization approaches learn transformations (low-bit weights) as well as encodings (low-bit activations) simultaneously. This tight coupling makes the optimization problem difficult, and thus prevents the network from learning optimal representations. In this paper, we propose a simple yet effective Two-Step Quantization (TSQ) framework, by decomposing the network quantization problem into two steps: code learning and transformation function learning based on the learned codes. For the first step, we propose the sparse quantization method for code learning. The second step can be formulated as a non-linear least square regression problem with low-bit constraints, which can be solved efficiently in an iterative manner. Extensive experiments on CIFAR-10 and ILSVRC-12 datasets demonstrate that the proposed TSQ is effective and outperforms the state-of-the-art by a large margin. Especially, for 2-bit activation and ternary weight quantization of AlexNet, the accuracy of our TSQ drops only about 0.5 points compared with the full-precision counterpart, outperforming current state-of-the-art by more than 5 points.

在量化神经网络的硬件设计中，每一个比特都至关重要。然而，极低的位表示通常会导致较大的精度下降。因此，如何训练具有高精度的极低比特神经网络是至关重要的。大多数现有的网络量化方法同时学习变换(低比特权重)和编码(低比特激活)。这种紧密耦合使得优化问题变得困难，从而阻止了网络学习最优表示。本文提出了一种简单而有效的两步量化(TSQ)框架，将网络量化问题分解为两个步骤:代码学习和基于学习到的代码的转换函数学习。第一步，我们提出了用于代码学习的稀疏量化方法。第二步可以表示为具有低位约束的非线性最小二乘回归问题，该问题可以通过迭代方式有效地求解。在CIFAR-10和ILSVRC-12数据集上的大量实验表明，所提出的TSQ是有效的，并且在很大程度上优于最先进的TSQ。特别是，对于AlexNet的2位激活和三元权重量化，我们的TSQ精度与全精度相比仅下降了约0.5分，比目前最先进的技术高出5分以上。

{"title":"Two-Step Quantization for Low-bit Neural Networks","authors":"Peisong Wang, Qinghao Hu, Yifan Zhang, Chunjie Zhang, Yang Liu, Jian Cheng","doi":"10.1109/CVPR.2018.00460","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00460","url":null,"abstract":"Every bit matters in the hardware design of quantized neural networks. However, extremely-low-bit representation usually causes large accuracy drop. Thus, how to train extremely-low-bit neural networks with high accuracy is of central importance. Most existing network quantization approaches learn transformations (low-bit weights) as well as encodings (low-bit activations) simultaneously. This tight coupling makes the optimization problem difficult, and thus prevents the network from learning optimal representations. In this paper, we propose a simple yet effective Two-Step Quantization (TSQ) framework, by decomposing the network quantization problem into two steps: code learning and transformation function learning based on the learned codes. For the first step, we propose the sparse quantization method for code learning. The second step can be formulated as a non-linear least square regression problem with low-bit constraints, which can be solved efficiently in an iterative manner. Extensive experiments on CIFAR-10 and ILSVRC-12 datasets demonstrate that the proposed TSQ is effective and outperforms the state-of-the-art by a large margin. Especially, for 2-bit activation and ternary weight quantization of AlexNet, the accuracy of our TSQ drops only about 0.5 points compared with the full-precision counterpart, outperforming current state-of-the-art by more than 5 points.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"55 1","pages":"4376-4384"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80763609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 113

Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net 速度与激情:实时端到端3D检测，跟踪和运动预测与单一卷积网络

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00376

R. Urtasun Binh Yang Wenjie Luo

In this paper we propose a novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor. By jointly reasoning about these tasks, our holistic approach is more robust to occlusion as well as sparse data at range. Our approach performs 3D convolutions across space and time over a bird's eye view representation of the 3D world, which is very efficient in terms of both memory and computation. Our experiments on a new very large scale dataset captured in several north american cities, show that we can outperform the state-of-the-art by a large margin. Importantly, by sharing computation we can perform all tasks in as little as 30 ms.

在本文中，我们提出了一种新的深度神经网络，它能够根据三维传感器捕获的数据对三维检测、跟踪和运动预测进行联合推理。通过对这些任务的联合推理，我们的整体方法对遮挡和稀疏数据的鲁棒性更强。我们的方法在3D世界的鸟瞰图上执行跨空间和时间的3D卷积，这在内存和计算方面都非常高效。我们在北美几个城市的一个新的大规模数据集上进行的实验表明，我们可以在很大程度上超越最先进的技术。重要的是，通过共享计算，我们可以在短短30毫秒内执行所有任务。

引用次数: 540

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀