2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文中文

The Geometry of First-Returning Photons for Non-Line-of-Sight Imaging 非视距成像中首回光子的几何形状

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.251

Chia-Yin Tsai, Kiriakos N. Kutulakos, S. Narasimhan, Aswin C. Sankaranarayanan

Non-line-of-sight (NLOS) imaging utilizes the full 5D light transient measurements to reconstruct scenes beyond the cameras field of view. Mathematically, this requires solving an elliptical tomography problem that unmixes the shape and albedo from spatially-multiplexed measurements of the NLOS scene. In this paper, we propose a new approach for NLOS imaging by studying the properties of first-returning photons from three-bounce light paths. We show that the times of flight of first-returning photons are dependent only on the geometry of the NLOS scene and each observation is almost always generated from a single NLOS scene point. Exploiting these properties, we derive a space carving algorithm for NLOS scenes. In addition, by assuming local planarity, we derive an algorithm to localize NLOS scene points in 3D and estimate their surface normals. Our methods do not require either the full transient measurements or solving the hard elliptical tomography problem. We demonstrate the effectiveness of our methods through simulations as well as real data captured from a SPAD sensor.

非视距成像(NLOS)利用全5D光瞬态测量来重建相机视野之外的场景。在数学上，这需要解决椭圆层析成像问题，从NLOS场景的空间复用测量中分离出形状和反照率。在本文中，我们提出了一种新的NLOS成像方法，通过研究三反射光路的第一返回光子的性质。我们表明，首次返回光子的飞行时间仅取决于NLOS场景的几何形状，并且每次观测几乎总是从单个NLOS场景点生成。利用这些特性，我们推导了一种NLOS场景的空间雕刻算法。此外，通过假设局部平面性，我们推导了一种算法来定位NLOS场景中的三维点并估计它们的表面法线。我们的方法既不需要完整的瞬态测量，也不需要解决硬椭圆层析成像问题。我们通过模拟以及从SPAD传感器捕获的真实数据证明了我们方法的有效性。

{"title":"The Geometry of First-Returning Photons for Non-Line-of-Sight Imaging","authors":"Chia-Yin Tsai, Kiriakos N. Kutulakos, S. Narasimhan, Aswin C. Sankaranarayanan","doi":"10.1109/CVPR.2017.251","DOIUrl":"https://doi.org/10.1109/CVPR.2017.251","url":null,"abstract":"Non-line-of-sight (NLOS) imaging utilizes the full 5D light transient measurements to reconstruct scenes beyond the cameras field of view. Mathematically, this requires solving an elliptical tomography problem that unmixes the shape and albedo from spatially-multiplexed measurements of the NLOS scene. In this paper, we propose a new approach for NLOS imaging by studying the properties of first-returning photons from three-bounce light paths. We show that the times of flight of first-returning photons are dependent only on the geometry of the NLOS scene and each observation is almost always generated from a single NLOS scene point. Exploiting these properties, we derive a space carving algorithm for NLOS scenes. In addition, by assuming local planarity, we derive an algorithm to localize NLOS scene points in 3D and estimate their surface normals. Our methods do not require either the full transient measurements or solving the hard elliptical tomography problem. We demonstrate the effectiveness of our methods through simulations as well as real data captured from a SPAD sensor.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"74 1","pages":"2336-2344"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88772605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 77

Zero Shot Learning via Multi-scale Manifold Regularization 基于多尺度流形正则化的零射击学习

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.562

Shay Deutsch, Soheil Kolouri, Kyungnam Kim, Y. Owechko, Stefano Soatto

We address zero-shot learning using a new manifold alignment framework based on a localized multi-scale transform on graphs. Our inference approach includes a smoothness criterion for a function mapping nodes on a graph (visual representation) onto a linear space (semantic representation), which we optimize using multi-scale graph wavelets. The robustness of the ensuing scheme allows us to operate with automatically generated semantic annotations, resulting in an algorithm that is entirely free of manual supervision, and yet improves the state-of-the-art as measured on benchmark datasets.

我们使用基于图的局部多尺度变换的新的流形对齐框架来解决零射击学习。我们的推理方法包括将图(视觉表示)上的节点映射到线性空间(语义表示)上的函数的平滑准则，我们使用多尺度图小波对其进行优化。后续方案的鲁棒性使我们能够使用自动生成的语义注释进行操作，从而产生完全不需要人工监督的算法，并且在基准数据集上提高了最先进的水平。

引用次数: 44

Discretely Coding Semantic Rank Orders for Supervised Image Hashing 用于监督图像哈希的语义秩顺序离散编码

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.546

Li Liu, Ling Shao, Fumin Shen, Mengyang Yu

Learning to hash has been recognized to accomplish highly efficient storage and retrieval for large-scale visual data. Particularly, ranking-based hashing techniques have recently attracted broad research attention because ranking accuracy among the retrieved data is well explored and their objective is more applicable to realistic search tasks. However, directly optimizing discrete hash codes without continuous-relaxations on a nonlinear ranking objective is infeasible by either traditional optimization methods or even recent discrete hashing algorithms. To address this challenging issue, in this paper, we introduce a novel supervised hashing method, dubbed Discrete Semantic Ranking Hashing (DSeRH), which aims to directly embed semantic rank orders into binary codes. In DSeRH, a generalized Adaptive Discrete Minimization (ADM) approach is proposed to discretely optimize binary codes with the quadratic nonlinear ranking objective in an iterative manner and is guaranteed to converge quickly. Additionally, instead of using 0/1 independent labels to form rank orders as in previous works, we generate the listwise rank orders from the high-level semantic word embeddings which can quantitatively capture the intrinsic correlation between different categories. We evaluate our DSeRH, coupled with both linear and deep convolutional neural network (CNN) hash functions, on three image datasets, i.e., CIFAR-10, SUN397 and ImageNet100, and the results manifest that DSeRH can outperform the state-of-the-art ranking-based hashing methods.

学习哈希已被公认为实现大规模视觉数据的高效存储和检索。特别是，基于排序的哈希技术最近引起了广泛的研究关注，因为它很好地探索了检索数据之间的排序准确性，并且它的目标更适用于实际的搜索任务。然而，无论是传统的优化方法还是最近的离散散列算法，都无法在非线性排序目标上直接优化不带连续松弛的离散散列码。为了解决这个具有挑战性的问题，在本文中，我们引入了一种新的监督哈希方法，称为离散语义排序哈希(DSeRH)，旨在将语义排序顺序直接嵌入到二进制代码中。在DSeRH中，提出了一种广义自适应离散最小化(ADM)方法，以迭代的方式对二元码进行二次非线性排序目标的离散优化，保证了算法的快速收敛。此外，我们没有像以前的工作那样使用0/1独立标签来形成排名顺序，而是从高级语义词嵌入中生成按列表排列的排名顺序，这可以定量地捕获不同类别之间的内在相关性。我们在三个图像数据集(即CIFAR-10, SUN397和ImageNet100)上评估了我们的DSeRH，结合线性和深度卷积神经网络(CNN)哈希函数，结果表明DSeRH可以优于最先进的基于排名的哈希方法。

{"title":"Discretely Coding Semantic Rank Orders for Supervised Image Hashing","authors":"Li Liu, Ling Shao, Fumin Shen, Mengyang Yu","doi":"10.1109/CVPR.2017.546","DOIUrl":"https://doi.org/10.1109/CVPR.2017.546","url":null,"abstract":"Learning to hash has been recognized to accomplish highly efficient storage and retrieval for large-scale visual data. Particularly, ranking-based hashing techniques have recently attracted broad research attention because ranking accuracy among the retrieved data is well explored and their objective is more applicable to realistic search tasks. However, directly optimizing discrete hash codes without continuous-relaxations on a nonlinear ranking objective is infeasible by either traditional optimization methods or even recent discrete hashing algorithms. To address this challenging issue, in this paper, we introduce a novel supervised hashing method, dubbed Discrete Semantic Ranking Hashing (DSeRH), which aims to directly embed semantic rank orders into binary codes. In DSeRH, a generalized Adaptive Discrete Minimization (ADM) approach is proposed to discretely optimize binary codes with the quadratic nonlinear ranking objective in an iterative manner and is guaranteed to converge quickly. Additionally, instead of using 0/1 independent labels to form rank orders as in previous works, we generate the listwise rank orders from the high-level semantic word embeddings which can quantitatively capture the intrinsic correlation between different categories. We evaluate our DSeRH, coupled with both linear and deep convolutional neural network (CNN) hash functions, on three image datasets, i.e., CIFAR-10, SUN397 and ImageNet100, and the results manifest that DSeRH can outperform the state-of-the-art ranking-based hashing methods.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"44 25 1","pages":"5140-5149"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72639515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

Deep Learning of Human Visual Sensitivity in Image Quality Assessment Framework 图像质量评价框架中人类视觉灵敏度的深度学习

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.213

Jongyoo Kim, Sanghoon Lee

Since human observers are the ultimate receivers of digital images, image quality metrics should be designed from a human-oriented perspective. Conventionally, a number of full-reference image quality assessment (FR-IQA) methods adopted various computational models of the human visual system (HVS) from psychological vision science research. In this paper, we propose a novel convolutional neural networks (CNN) based FR-IQA model, named Deep Image Quality Assessment (DeepQA), where the behavior of the HVS is learned from the underlying data distribution of IQA databases. Different from previous studies, our model seeks the optimal visual weight based on understanding of database information itself without any prior knowledge of the HVS. Through the experiments, we show that the predicted visual sensitivity maps agree with the human subjective opinions. In addition, DeepQA achieves the state-of-the-art prediction accuracy among FR-IQA models.

由于人类观察者是数字图像的最终接收者，因此图像质量指标应该从以人为本的角度来设计。传统上，许多全参考图像质量评估方法采用了心理视觉科学研究中人类视觉系统(HVS)的各种计算模型。在本文中，我们提出了一种新的基于卷积神经网络(CNN)的FR-IQA模型，称为深度图像质量评估(DeepQA)，其中HVS的行为是从IQA数据库的底层数据分布中学习的。与以往的研究不同，我们的模型在没有任何HVS先验知识的情况下，基于对数据库信息本身的理解来寻求最优的视觉权重。通过实验，我们证明了预测的视觉敏感度图符合人类的主观看法。此外，DeepQA在FR-IQA模型中实现了最先进的预测精度。

引用次数: 187

Grassmannian Manifold Optimization Assisted Sparse Spectral Clustering 格拉斯曼流形优化辅助稀疏谱聚类

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.335

Qiong Wang, Junbin Gao, Hong Li

Spectral Clustering is one of pioneered clustering methods in machine learning and pattern recognition field. It relies on the spectral decomposition criterion to learn a low-dimensonal embedding of data for a basic clustering algorithm such as the k-means. The recent sparse Spectral clustering (SSC) introduces the sparsity for the similarity in low-dimensional space by enforcing a sparsity-induced penalty, resulting a non-convex optimization, and the solution is calculated through a relaxed convex problem via the standard ADMM (Alternative Direction Method of Multipliers), rather than inferring latent representation from eigen-structure. This paper provides a direct solution as solving a new Grassmann optimization problem. By this way calculating latent embedding becomes part of optmization on manifolds and the recently developed manifold optimization methods can be applied. It turns out the learned new features are not only very informative for clustering, but also more intuitive and effective in visualization after dimensionality reduction. We conduct empirical studies on simulated datasets and several real-world benchmark datasets to validate the proposed methods. Experimental results exhibit the effectiveness of this new manifold-based clustering and dimensionality reduction method.

谱聚类是机器学习和模式识别领域中最先进的聚类方法之一。它依靠谱分解准则来学习数据的低维嵌入，用于k-means等基本聚类算法。最近的稀疏谱聚类(SSC)通过实施稀疏性诱导惩罚来引入低维空间相似性的稀疏性，从而产生非凸优化，并且通过标准ADMM(乘法器的可选方向方法)通过松弛凸问题计算解决方案，而不是从特征结构中推断潜在表示。本文给出了求解一类新的Grassmann优化问题的直接解。通过这种方法，隐嵌入的计算成为流形优化的一部分，可以应用最新发展的流形优化方法。结果表明，学习到的新特征不仅为聚类提供了丰富的信息，而且在降维后的可视化中更加直观和有效。我们对模拟数据集和几个真实世界的基准数据集进行了实证研究，以验证所提出的方法。实验结果表明了这种基于流形的聚类降维方法的有效性。

{"title":"Grassmannian Manifold Optimization Assisted Sparse Spectral Clustering","authors":"Qiong Wang, Junbin Gao, Hong Li","doi":"10.1109/CVPR.2017.335","DOIUrl":"https://doi.org/10.1109/CVPR.2017.335","url":null,"abstract":"Spectral Clustering is one of pioneered clustering methods in machine learning and pattern recognition field. It relies on the spectral decomposition criterion to learn a low-dimensonal embedding of data for a basic clustering algorithm such as the k-means. The recent sparse Spectral clustering (SSC) introduces the sparsity for the similarity in low-dimensional space by enforcing a sparsity-induced penalty, resulting a non-convex optimization, and the solution is calculated through a relaxed convex problem via the standard ADMM (Alternative Direction Method of Multipliers), rather than inferring latent representation from eigen-structure. This paper provides a direct solution as solving a new Grassmann optimization problem. By this way calculating latent embedding becomes part of optmization on manifolds and the recently developed manifold optimization methods can be applied. It turns out the learned new features are not only very informative for clustering, but also more intuitive and effective in visualization after dimensionality reduction. We conduct empirical studies on simulated datasets and several real-world benchmark datasets to validate the proposed methods. Experimental results exhibit the effectiveness of this new manifold-based clustering and dimensionality reduction method.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"2 1","pages":"3145-3153"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79922376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning 知识升华的礼物:快速优化、网络最小化和迁移学习

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.754

Junho Yim, Donggyu Joo, Ji-Hoon Bae, Junmo Kim

We introduce a novel technique for knowledge transfer, where knowledge from a pretrained deep neural network (DNN) is distilled and transferred to another DNN. As the DNN performs a mapping from the input space to the output space through many layers sequentially, we define the distilled knowledge to be transferred in terms of flow between layers, which is calculated by computing the inner product between features from two layers. When we compare the student DNN and the original network with the same size as the student DNN but trained without a teacher network, the proposed method of transferring the distilled knowledge as the flow between two layers exhibits three important phenomena: (1) the student DNN that learns the distilled knowledge is optimized much faster than the original model, (2) the student DNN outperforms the original DNN, and (3) the student DNN can learn the distilled knowledge from a teacher DNN that is trained at a different task, and the student DNN outperforms the original DNN that is trained from scratch.

我们引入了一种新的知识转移技术，将来自预训练深度神经网络(DNN)的知识提取并转移到另一个深度神经网络。当DNN通过多个层依次执行从输入空间到输出空间的映射时，我们根据层间流定义了要传输的提炼知识，这是通过计算两层特征之间的内积来计算的。当我们将学生深度神经网络与原始网络进行比较时(原始网络的大小与学生深度神经网络相同，但没有经过教师网络的训练)，所提出的将提炼的知识作为两层之间的流动传输的方法显示出三个重要现象:(1)学习提炼知识的学生DNN比原始模型优化得快得多，(2)学生DNN优于原始DNN，(3)学生DNN可以从在不同任务中训练的教师DNN学习提炼的知识，并且学生DNN优于从头开始训练的原始DNN。

引用次数: 1195

3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images 三维卷积神经网络在单深度图像手部姿态估计中的应用

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.602

Liuhao Ge, Hui Liang, Junsong Yuan, D. Thalmann

We propose a simple, yet effective approach for real-time hand pose estimation from single depth images using three-dimensional Convolutional Neural Networks (3D CNNs). Image based features extracted by 2D CNNs are not directly suitable for 3D hand pose estimation due to the lack of 3D spatial information. Our proposed 3D CNN taking a 3D volumetric representation of the hand depth image as input can capture the 3D spatial structure of the input and accurately regress full 3D hand pose in a single pass. In order to make the 3D CNN robust to variations in hand sizes and global orientations, we perform 3D data augmentation on the training data. Experiments show that our proposed 3D CNN based approach outperforms state-of-the-art methods on two challenging hand pose datasets, and is very efficient as our implementation runs at over 215 fps on a standard computer with a single GPU.

我们提出了一种简单而有效的方法，利用三维卷积神经网络(3D cnn)从单深度图像中实时估计手部姿势。由于缺乏三维空间信息，2D cnn提取的基于图像的特征不能直接用于三维手姿估计。我们提出的3D CNN以手部深度图像的3D体积表示作为输入，可以捕获输入的3D空间结构，并在一次通道中准确地回归完整的3D手部姿势。为了使3D CNN对手部大小和全局方向的变化具有鲁棒性，我们对训练数据进行了3D数据增强。实验表明，我们提出的基于3D CNN的方法在两个具有挑战性的手部姿势数据集上优于最先进的方法，并且非常高效，因为我们的实现在具有单个GPU的标准计算机上以超过215 fps的速度运行。

引用次数: 243

Zero-Shot Classification with Discriminative Semantic Representation Learning 基于判别语义表示学习的零概率分类

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.542

Meng Ye, Yuhong Guo

Zero-shot learning, a special case of unsupervised domain adaptation where the source and target domains have disjoint label spaces, has become increasingly popular in the computer vision community. In this paper, we propose a novel zero-shot learning method based on discriminative sparse non-negative matrix factorization. The proposed approach aims to identify a set of common high-level semantic components across the two domains via non-negative sparse matrix factorization, while enforcing the representation vectors of the images in this common component-based space to be discriminatively aligned with the attribute-based label representation vectors. To fully exploit the aligned semantic information contained in the learned representation vectors of the instances, we develop a label propagation based testing procedure to classify the unlabeled instances from the unseen classes in the target domain. We conduct experiments on four standard zero-shot learning image datasets, by comparing the proposed approach to the state-of-the-art zero-shot learning methods. The empirical results demonstrate the efficacy of the proposed approach.

零射击学习是一种无监督域自适应的特殊情况，源域和目标域具有不相交的标签空间，在计算机视觉界越来越受欢迎。本文提出了一种基于判别稀疏非负矩阵分解的零次学习方法。该方法旨在通过非负稀疏矩阵分解来识别跨两个域的一组常见的高级语义组件，同时强制该公共组件空间中的图像表示向量与基于属性的标签表示向量判别对齐。为了充分利用实例的学习表示向量中包含的对齐语义信息，我们开发了一种基于标签传播的测试过程，将目标域中未标记的实例与未见过的类进行分类。我们在四个标准的零射击学习图像数据集上进行了实验，并将所提出的方法与最先进的零射击学习方法进行了比较。实证结果证明了该方法的有效性。

{"title":"Zero-Shot Classification with Discriminative Semantic Representation Learning","authors":"Meng Ye, Yuhong Guo","doi":"10.1109/CVPR.2017.542","DOIUrl":"https://doi.org/10.1109/CVPR.2017.542","url":null,"abstract":"Zero-shot learning, a special case of unsupervised domain adaptation where the source and target domains have disjoint label spaces, has become increasingly popular in the computer vision community. In this paper, we propose a novel zero-shot learning method based on discriminative sparse non-negative matrix factorization. The proposed approach aims to identify a set of common high-level semantic components across the two domains via non-negative sparse matrix factorization, while enforcing the representation vectors of the images in this common component-based space to be discriminatively aligned with the attribute-based label representation vectors. To fully exploit the aligned semantic information contained in the learned representation vectors of the instances, we develop a label propagation based testing procedure to classify the unlabeled instances from the unseen classes in the target domain. We conduct experiments on four standard zero-shot learning image datasets, by comparing the proposed approach to the state-of-the-art zero-shot learning methods. The empirical results demonstrate the efficacy of the proposed approach.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"55 1","pages":"5103-5111"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91031571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 119

Robust Visual Tracking Using Oblique Random Forests 使用倾斜随机森林的鲁棒视觉跟踪

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.617

Le Zhang, Jagannadan Varadarajan, P. N. Suganthan, N. Ahuja, P. Moulin

Random forest has emerged as a powerful classification technique with promising results in various vision tasks including image classification, pose estimation and object detection. However, current techniques have shown little improvements in visual tracking as they mostly rely on piece wise orthogonal hyperplanes to create decision nodes and lack a robust incremental learning mechanism that is much needed for online tracking. In this paper, we propose a discriminative tracker based on a novel incremental oblique random forest. Unlike conventional orthogonal decision trees that use a single feature and heuristic measures to obtain a split at each node, we propose to use a more powerful proximal SVM to obtain oblique hyperplanes to capture the geometric structure of the data better. The resulting decision surface is not restricted to be axis aligned and hence has the ability to represent and classify the input data better. Furthermore, in order to generalize to online tracking scenarios, we derive incremental update steps that enable the hyperplanes in each node to be updated recursively, efficiently and in a closed-form fashion. We demonstrate the effectiveness of our method using two large scale benchmark datasets (OTB-51 and OTB-100) and show that our method gives competitive results on several challenging cases by relying on simple HOG features as well as in combination with more sophisticated deep neural network based models. The implementations of the proposed random forest are available at https://github.com/ZhangLeUestc/ Incremental-Oblique-Random-Forest.

随机森林已经成为一种强大的分类技术，在图像分类、姿态估计和目标检测等各种视觉任务中都有很好的应用前景。然而，目前的技术在视觉跟踪方面几乎没有进步，因为它们大多依赖于分段正交超平面来创建决策节点，并且缺乏在线跟踪所需的健壮的增量学习机制。本文提出了一种基于增量斜随机森林的判别跟踪器。与传统的正交决策树使用单个特征和启发式度量来获得每个节点的分裂不同，我们建议使用更强大的近端支持向量机来获得斜超平面，以更好地捕获数据的几何结构。生成的决策面不受轴对齐的限制，因此能够更好地表示和分类输入数据。此外，为了推广到在线跟踪场景，我们推导了增量更新步骤，使每个节点中的超平面能够递归地、有效地以封闭形式更新。我们使用两个大型基准数据集(OTB-51和OTB-100)证明了我们方法的有效性，并表明我们的方法通过依赖简单的HOG特征以及结合更复杂的基于深度神经网络的模型，在几个具有挑战性的情况下给出了有竞争力的结果。所提出的随机森林的实现可以在https://github.com/ZhangLeUestc/ Incremental-Oblique-Random-Forest上获得。

{"title":"Robust Visual Tracking Using Oblique Random Forests","authors":"Le Zhang, Jagannadan Varadarajan, P. N. Suganthan, N. Ahuja, P. Moulin","doi":"10.1109/CVPR.2017.617","DOIUrl":"https://doi.org/10.1109/CVPR.2017.617","url":null,"abstract":"Random forest has emerged as a powerful classification technique with promising results in various vision tasks including image classification, pose estimation and object detection. However, current techniques have shown little improvements in visual tracking as they mostly rely on piece wise orthogonal hyperplanes to create decision nodes and lack a robust incremental learning mechanism that is much needed for online tracking. In this paper, we propose a discriminative tracker based on a novel incremental oblique random forest. Unlike conventional orthogonal decision trees that use a single feature and heuristic measures to obtain a split at each node, we propose to use a more powerful proximal SVM to obtain oblique hyperplanes to capture the geometric structure of the data better. The resulting decision surface is not restricted to be axis aligned and hence has the ability to represent and classify the input data better. Furthermore, in order to generalize to online tracking scenarios, we derive incremental update steps that enable the hyperplanes in each node to be updated recursively, efficiently and in a closed-form fashion. We demonstrate the effectiveness of our method using two large scale benchmark datasets (OTB-51 and OTB-100) and show that our method gives competitive results on several challenging cases by relying on simple HOG features as well as in combination with more sophisticated deep neural network based models. The implementations of the proposed random forest are available at https://github.com/ZhangLeUestc/ Incremental-Oblique-Random-Forest.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"5825-5834"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89883445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 92

Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting in the Wild 基于级联实例感知分割的多尺度FCN随机定向词识别

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.58

Dafang He, X. Yang, Chen Liang, Zihan Zhou, Alexander Ororbia, Daniel Kifer, C. Lee Giles

Scene text detection has attracted great attention these years. Text potentially exist in a wide variety of images or videos and play an important role in understanding the scene. In this paper, we present a novel text detection algorithm which is composed of two cascaded steps: (1) a multi-scale fully convolutional neural network (FCN) is proposed to extract text block regions, (2) a novel instance (word or line) aware segmentation is designed to further remove false positives and obtain word instances. The proposed algorithm can accurately localize word or text line in arbitrary orientations, including curved text lines which cannot be handled in a lot of other frameworks. Our algorithm achieved state-of-the-art performance in ICDAR 2013 (IC13), ICDAR 2015 (IC15) and CUTE80 and Street View Text (SVT) benchmark datasets.

场景文本检测是近年来备受关注的问题。文本可能存在于各种各样的图像或视频中，并在理解场景中发挥重要作用。在本文中，我们提出了一种新的文本检测算法，该算法由两个级联步骤组成:(1)提出了一种多尺度全卷积神经网络(FCN)来提取文本块区域;(2)设计了一种新的实例(词或行)感知分割，以进一步去除误报并获得词实例。该算法可以精确定位任意方向的文字或文本行，包括许多其他框架无法处理的弯曲文本行。我们的算法在ICDAR 2013 (IC13)、ICDAR 2015 (IC15)和CUTE80以及街景文本(SVT)基准数据集上取得了最先进的性能。

引用次数: 66

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀