首页 > 最新文献

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Proximal Riemannian Pursuit for Large-Scale Trace-Norm Minimization 大规模迹范数最小化的近端黎曼追踪
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.633
Mingkui Tan, Shijie Xiao, Junbin Gao, Dong Xu, A. Hengel, Javen Qinfeng Shi
Trace-norm regularization plays an important role in many areas such as computer vision and machine learning. When solving general large-scale trace-norm regularized problems, existing methods may be computationally expensive due to many high-dimensional truncated singular value decompositions (SVDs) or the unawareness of matrix ranks. In this paper, we propose a proximal Riemannian pursuit (PRP) paradigm which addresses a sequence of trace-norm regularized subproblems defined on nonlinear matrix varieties. To address the subproblem, we extend the proximal gradient method on vector space to nonlinear matrix varieties, in which the SVDs of intermediate solutions are maintained by cheap low-rank QR decompositions, therefore making the proposed method more scalable. Empirical studies on several tasks, such as matrix completion and low-rank representation based subspace clustering, demonstrate the competitive performance of the proposed paradigms over existing methods.
跟踪范数正则化在计算机视觉、机器学习等领域发挥着重要作用。在求解一般的大规模迹范数正则化问题时,现有方法由于存在大量高维截断奇异值分解(svd)或对矩阵秩的不感知,计算量大。在本文中,我们提出了一个近端黎曼追求(PRP)范式,该范式解决了一系列定义在非线性矩阵变体上的迹范数正则化子问题。为了解决子问题,我们将向量空间上的近端梯度方法扩展到非线性矩阵变体,其中中间解的svd通过廉价的低秩QR分解来维持,从而使所提出的方法更具可扩展性。对若干任务的实证研究,如矩阵补全和基于低秩表示的子空间聚类,证明了所提出的范式与现有方法的竞争性能。
{"title":"Proximal Riemannian Pursuit for Large-Scale Trace-Norm Minimization","authors":"Mingkui Tan, Shijie Xiao, Junbin Gao, Dong Xu, A. Hengel, Javen Qinfeng Shi","doi":"10.1109/CVPR.2016.633","DOIUrl":"https://doi.org/10.1109/CVPR.2016.633","url":null,"abstract":"Trace-norm regularization plays an important role in many areas such as computer vision and machine learning. When solving general large-scale trace-norm regularized problems, existing methods may be computationally expensive due to many high-dimensional truncated singular value decompositions (SVDs) or the unawareness of matrix ranks. In this paper, we propose a proximal Riemannian pursuit (PRP) paradigm which addresses a sequence of trace-norm regularized subproblems defined on nonlinear matrix varieties. To address the subproblem, we extend the proximal gradient method on vector space to nonlinear matrix varieties, in which the SVDs of intermediate solutions are maintained by cheap low-rank QR decompositions, therefore making the proposed method more scalable. Empirical studies on several tasks, such as matrix completion and low-rank representation based subspace clustering, demonstrate the competitive performance of the proposed paradigms over existing methods.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"49 1","pages":"5877-5886"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86334233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Prior-Less Compressible Structure from Motion 运动中的无先验可压缩结构
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.447
Chen Kong, S. Lucey
Many non-rigid 3D structures are not modelled well through a low-rank subspace assumption. This is problematic when it comes to their reconstruction through Structure from Motion (SfM). We argue in this paper that a more expressive and general assumption can be made around compressible 3D structures. The vision community, however, has hitherto struggled to formulate effective strategies for recovering such structures after projection without the aid of additional priors (e.g. temporal ordering, rigid substructures, etc.). In this paper we present a "prior-less" approach to solve compressible SfM. Specifically, we demonstrate how the problem of SfM - assuming compressible 3D structures - can be theoretically characterized as a block sparse dictionary learning problem. We validate our approach experimentally by demonstrating reconstructions of 3D structures that are intractable using current state-of-theart low-rank SfM approaches.
许多非刚性三维结构不能通过低秩子空间假设很好地建模。当涉及到通过运动结构(SfM)重建时,这是有问题的。在本文中,我们认为可以围绕可压缩3D结构做出更具表现力和一般性的假设。然而,迄今为止,视觉界一直在努力制定有效的策略,在没有额外先验(例如时间顺序,刚性子结构等)的帮助下,在投影后恢复这些结构。本文提出了一种求解可压缩SfM的“无先验”方法。具体来说,我们演示了如何将SfM问题(假设可压缩3D结构)在理论上表征为块稀疏字典学习问题。我们通过实验验证了我们的方法,展示了使用当前最先进的低秩SfM方法难以处理的3D结构的重建。
{"title":"Prior-Less Compressible Structure from Motion","authors":"Chen Kong, S. Lucey","doi":"10.1109/CVPR.2016.447","DOIUrl":"https://doi.org/10.1109/CVPR.2016.447","url":null,"abstract":"Many non-rigid 3D structures are not modelled well through a low-rank subspace assumption. This is problematic when it comes to their reconstruction through Structure from Motion (SfM). We argue in this paper that a more expressive and general assumption can be made around compressible 3D structures. The vision community, however, has hitherto struggled to formulate effective strategies for recovering such structures after projection without the aid of additional priors (e.g. temporal ordering, rigid substructures, etc.). In this paper we present a \"prior-less\" approach to solve compressible SfM. Specifically, we demonstrate how the problem of SfM - assuming compressible 3D structures - can be theoretically characterized as a block sparse dictionary learning problem. We validate our approach experimentally by demonstrating reconstructions of 3D structures that are intractable using current state-of-theart low-rank SfM approaches.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"55 7 1","pages":"4123-4131"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85823632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Scale-Aware Alignment of Hierarchical Image Segmentation 层次图像分割的尺度感知对齐
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.46
Yuhua Chen, Dengxin Dai, J. Pont-Tuset, L. Gool
Image segmentation is a key component in many computer vision systems, and it is recovering a prominent spot in the literature as methods improve and overcome their limitations. The outputs of most recent algorithms are in the form of a hierarchical segmentation, which provides segmentation at different scales in a single tree-like structure. Commonly, these hierarchical methods start from some low-level features, and are not aware of the scale information of the different regions in them. As such, one might need to work on many different levels of the hierarchy to find the objects in the scene. This work tries to modify the existing hierarchical algorithm by improving their alignment, that is, by trying to modify the depth of the regions in the tree to better couple depth and scale. To do so, we first train a regressor to predict the scale of regions using mid-level features. We then define the anchor slice as the set of regions that better balance between over-segmentation and under-segmentation. The output of our method is an improved hierarchy, re-aligned by the anchor slice. To demonstrate the power of our method, we perform comprehensive experiments, which show that our method, as a post-processing step, can significantly improve the quality of the hierarchical segmentation representations, and ease the usage of hierarchical image segmentation to high-level vision tasks such as object segmentation. We also prove that the improvement generalizes well across different algorithms and datasets, with a low computational cost.
图像分割是许多计算机视觉系统的关键组成部分,随着方法的改进和克服其局限性,它正在恢复在文献中的突出位置。大多数最新算法的输出都是分层分割的形式,它在单个树状结构中提供不同尺度的分割。通常,这些分层方法从一些底层特征出发,不知道其中不同区域的尺度信息。因此,一个人可能需要在许多不同的层次上工作来找到场景中的对象。这项工作试图通过改进现有的分层算法的对齐来修改它们,即通过尝试修改树中区域的深度来更好地耦合深度和规模。为此,我们首先训练一个回归器来使用中级特征预测区域的规模。然后,我们将锚片定义为一组更好地平衡过分割和欠分割的区域。我们方法的输出是一个改进的层次结构,通过锚片重新对齐。为了证明我们的方法的强大,我们进行了全面的实验,结果表明,我们的方法作为后处理步骤,可以显着提高分层图像分割表示的质量,并简化了分层图像分割在高级视觉任务(如对象分割)中的使用。我们还证明了这种改进可以很好地推广到不同的算法和数据集,并且计算成本很低。
{"title":"Scale-Aware Alignment of Hierarchical Image Segmentation","authors":"Yuhua Chen, Dengxin Dai, J. Pont-Tuset, L. Gool","doi":"10.1109/CVPR.2016.46","DOIUrl":"https://doi.org/10.1109/CVPR.2016.46","url":null,"abstract":"Image segmentation is a key component in many computer vision systems, and it is recovering a prominent spot in the literature as methods improve and overcome their limitations. The outputs of most recent algorithms are in the form of a hierarchical segmentation, which provides segmentation at different scales in a single tree-like structure. Commonly, these hierarchical methods start from some low-level features, and are not aware of the scale information of the different regions in them. As such, one might need to work on many different levels of the hierarchy to find the objects in the scene. This work tries to modify the existing hierarchical algorithm by improving their alignment, that is, by trying to modify the depth of the regions in the tree to better couple depth and scale. To do so, we first train a regressor to predict the scale of regions using mid-level features. We then define the anchor slice as the set of regions that better balance between over-segmentation and under-segmentation. The output of our method is an improved hierarchy, re-aligned by the anchor slice. To demonstrate the power of our method, we perform comprehensive experiments, which show that our method, as a post-processing step, can significantly improve the quality of the hierarchical segmentation representations, and ease the usage of hierarchical image segmentation to high-level vision tasks such as object segmentation. We also prove that the improvement generalizes well across different algorithms and datasets, with a low computational cost.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"45 1","pages":"364-372"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90800903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
MDL-CW: A Multimodal Deep Learning Framework with CrossWeights 基于交叉权重的多模态深度学习框架
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.285
Sarah Rastegar, M. Baghshah, H. Rabiee, Seyed Mohsen Shojaee
Deep learning has received much attention as of the most powerful approaches for multimodal representation learning in recent years. An ideal model for multimodal data can reason about missing modalities using the available ones, and usually provides more information when multiple modalities are being considered. All the previous deep models contain separate modality-specific networks and find a shared representation on top of those networks. Therefore, they only consider high level interactions between modalities to find a joint representation for them. In this paper, we propose a multimodal deep learning framework (MDLCW) that exploits the cross weights between representation of modalities, and try to gradually learn interactions of the modalities in a deep network manner (from low to high level interactions). Moreover, we theoretically show that considering these interactions provide more intra-modality information, and introduce a multi-stage pre-training method that is based on the properties of multi-modal data. In the proposed framework, as opposed to the existing deep methods for multi-modal data, we try to reconstruct the representation of each modality at a given level, with representation of other modalities in the previous layer. Extensive experimental results show that the proposed model outperforms state-of-the-art information retrieval methods for both image and text queries on the PASCAL-sentence and SUN-Attribute databases.
近年来,深度学习作为多模态表示学习中最强大的方法受到了广泛的关注。多模态数据的理想模型可以使用可用的模态来推断缺失的模态,并且通常在考虑多模态时提供更多信息。所有以前的深度模型都包含单独的特定于模态的网络,并在这些网络之上找到一个共享的表示。因此,他们只考虑模式之间的高层交互,以找到它们的联合表示。在本文中,我们提出了一个多模态深度学习框架(MDLCW),该框架利用模态表示之间的交叉权重,并试图以深度网络的方式逐渐学习模态之间的相互作用(从低级交互到高级交互)。此外,我们从理论上证明了考虑这些相互作用可以提供更多的模态内信息,并引入了一种基于多模态数据特性的多阶段预训练方法。在提出的框架中,与现有的多模态数据的深度方法相反,我们试图在给定的层次上重建每个模态的表示,并在前一层中表示其他模态。大量的实验结果表明,该模型在PASCAL-sentence和SUN-Attribute数据库上的图像和文本查询都优于最先进的信息检索方法。
{"title":"MDL-CW: A Multimodal Deep Learning Framework with CrossWeights","authors":"Sarah Rastegar, M. Baghshah, H. Rabiee, Seyed Mohsen Shojaee","doi":"10.1109/CVPR.2016.285","DOIUrl":"https://doi.org/10.1109/CVPR.2016.285","url":null,"abstract":"Deep learning has received much attention as of the most powerful approaches for multimodal representation learning in recent years. An ideal model for multimodal data can reason about missing modalities using the available ones, and usually provides more information when multiple modalities are being considered. All the previous deep models contain separate modality-specific networks and find a shared representation on top of those networks. Therefore, they only consider high level interactions between modalities to find a joint representation for them. In this paper, we propose a multimodal deep learning framework (MDLCW) that exploits the cross weights between representation of modalities, and try to gradually learn interactions of the modalities in a deep network manner (from low to high level interactions). Moreover, we theoretically show that considering these interactions provide more intra-modality information, and introduce a multi-stage pre-training method that is based on the properties of multi-modal data. In the proposed framework, as opposed to the existing deep methods for multi-modal data, we try to reconstruct the representation of each modality at a given level, with representation of other modalities in the previous layer. Extensive experimental results show that the proposed model outperforms state-of-the-art information retrieval methods for both image and text queries on the PASCAL-sentence and SUN-Attribute databases.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"48 1","pages":"2601-2609"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79252352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Face2Face: Real-Time Face Capture and Reenactment of RGB Videos Face2Face:实时人脸捕捉和再现RGB视频
Justus Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, M. Nießner
We present a novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion. To this end, we first address the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling. At run time, we track facial expressions of both source and target video using a dense photometric consistency measure. Reenactment is then achieved by fast and efficient deformation transfer between source and target. The mouth interior that best matches the re-targeted expression is retrieved from the target sequence and warped to produce an accurate fit. Finally, we convincingly re-render the synthesized target face on top of the corresponding video stream such that it seamlessly blends with the real-world illumination. We demonstrate our method in a live setup, where Youtube videos are reenacted in real time.
我们提出了一种用于单目目标视频序列(例如Youtube视频)的实时面部再现的新方法。源序列也是单目视频流,用商品网络摄像头实时捕获。我们的目标是通过源演员将目标视频的面部表情动画化,并以逼真的方式重新渲染被操纵的输出视频。为此,我们首先通过基于非刚性模型的捆绑解决了从单目视频中恢复面部身份的约束不足问题。在运行时,我们使用密集的光度一致性测量来跟踪源视频和目标视频的面部表情。然后通过在源和目标之间快速有效的变形传递来实现再现。从目标序列中检索与重新定位表达最匹配的口腔内部,并扭曲以产生准确的匹配。最后,我们令人信服地在相应的视频流上重新渲染合成的目标面,使其与现实世界的照明无缝融合。我们在现场设置中演示了我们的方法,其中Youtube视频是实时重演的。
{"title":"Face2Face: Real-Time Face Capture and Reenactment of RGB Videos","authors":"Justus Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, M. Nießner","doi":"10.1145/3292039","DOIUrl":"https://doi.org/10.1145/3292039","url":null,"abstract":"We present a novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion. To this end, we first address the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling. At run time, we track facial expressions of both source and target video using a dense photometric consistency measure. Reenactment is then achieved by fast and efficient deformation transfer between source and target. The mouth interior that best matches the re-targeted expression is retrieved from the target sequence and warped to produce an accurate fit. Finally, we convincingly re-render the synthesized target face on top of the corresponding video stream such that it seamlessly blends with the real-world illumination. We demonstrate our method in a live setup, where Youtube videos are reenacted in real time.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"15 1","pages":"2387-2395"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72617275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1552
The Global Patch Collider 全局补丁碰撞器
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.21
Shenlong Wang, S. Fanello, Christoph Rhemann, S. Izadi, Pushmeet Kohli
This paper proposes a novel extremely efficient, fully-parallelizable, task-specific algorithm for the computation of global point-wise correspondences in images and videos. Our algorithm, the Global Patch Collider, is based on detecting unique collisions between image points using a collection of learned tree structures that act as conditional hash functions. In contrast to conventional approaches that rely on pairwise distance computation, our algorithm isolates distinctive pixel pairs that hit the same leaf during traversal through multiple learned tree structures. The split functions stored at the intermediate nodes of the trees are trained to ensure that only visually similar patches or their geometric or photometric transformed versions fall into the same leaf node. The matching process involves passing all pixel positions in the images under analysis through the tree structures. We then compute matches by isolating points that uniquely collide with each other ie. fell in the same empty leaf in multiple trees. Our algorithm is linear in the number of pixels but can be made constant time on a parallel computation architecture as the tree traversal for individual image points is decoupled. We demonstrate the efficacy of our method by using it to perform optical flow matching and stereo matching on some challenging benchmarks. Experimental results show that not only is our method extremely computationally efficient, but it is also able to match or outperform state of the art methods that are much more complex.
本文提出了一种新的高效的、完全可并行化的、任务特定的算法,用于计算图像和视频中的全局逐点对应。我们的算法,Global Patch Collider,是基于使用一组作为条件哈希函数的学习树结构来检测图像点之间的唯一碰撞。与依赖于两两距离计算的传统方法相比,我们的算法隔离了在遍历多个学习树结构时遇到相同叶子的不同像素对。存储在树中间节点的分割函数经过训练,以确保只有视觉上相似的斑块或其几何或光度转换版本落在相同的叶节点上。匹配过程包括通过树结构传递被分析图像中的所有像素位置。然后,我们通过隔离唯一相互碰撞的点来计算匹配。落在多棵树上的同一片空叶子上。我们的算法在像素数量上是线性的,但是由于单个图像点的树遍历是解耦的,因此可以在并行计算架构上进行常数时间。我们通过使用该方法在一些具有挑战性的基准上执行光流匹配和立体匹配来证明该方法的有效性。实验结果表明,我们的方法不仅具有极高的计算效率,而且还能够匹配或优于更复杂的最先进方法。
{"title":"The Global Patch Collider","authors":"Shenlong Wang, S. Fanello, Christoph Rhemann, S. Izadi, Pushmeet Kohli","doi":"10.1109/CVPR.2016.21","DOIUrl":"https://doi.org/10.1109/CVPR.2016.21","url":null,"abstract":"This paper proposes a novel extremely efficient, fully-parallelizable, task-specific algorithm for the computation of global point-wise correspondences in images and videos. Our algorithm, the Global Patch Collider, is based on detecting unique collisions between image points using a collection of learned tree structures that act as conditional hash functions. In contrast to conventional approaches that rely on pairwise distance computation, our algorithm isolates distinctive pixel pairs that hit the same leaf during traversal through multiple learned tree structures. The split functions stored at the intermediate nodes of the trees are trained to ensure that only visually similar patches or their geometric or photometric transformed versions fall into the same leaf node. The matching process involves passing all pixel positions in the images under analysis through the tree structures. We then compute matches by isolating points that uniquely collide with each other ie. fell in the same empty leaf in multiple trees. Our algorithm is linear in the number of pixels but can be made constant time on a parallel computation architecture as the tree traversal for individual image points is decoupled. We demonstrate the efficacy of our method by using it to perform optical flow matching and stereo matching on some challenging benchmarks. Experimental results show that not only is our method extremely computationally efficient, but it is also able to match or outperform state of the art methods that are much more complex.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"127-135"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82143987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Monocular Depth Estimation Using Neural Regression Forest 基于神经回归森林的单目深度估计
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.594
Anirban Roy, S. Todorovic
This paper presents a novel deep architecture, called neural regression forest (NRF), for depth estimation from a single image. NRF combines random forests and convolutional neural networks (CNNs). Scanning windows extracted from the image represent samples which are passed down the trees of NRF for predicting their depth. At every tree node, the sample is filtered with a CNN associated with that node. Results of the convolutional filtering are passed to left and right children nodes, i.e., corresponding CNNs, with a Bernoulli probability, until the leaves, where depth estimations are made. CNNs at every node are designed to have fewer parameters than seen in recent work, but their stacked processing along a path in the tree effectively amounts to a deeper CNN. NRF allows for parallelizable training of all "shallow" CNNs, and efficient enforcing of smoothness in depth estimation results. Our evaluation on the benchmark Make3D and NYUv2 datasets demonstrates that NRF outperforms the state of the art, and gracefully handles gradually decreasing training datasets.
本文提出了一种新的深度结构,称为神经回归森林(NRF),用于从单幅图像进行深度估计。NRF结合了随机森林和卷积神经网络(cnn)。从图像中提取的扫描窗口代表样本,这些样本沿着NRF树向下传递以预测其深度。在每个树节点上,使用与该节点相关的CNN对样本进行过滤。卷积滤波的结果以伯努利概率传递给左、右子节点,即相应的cnn,直到叶子节点,在那里进行深度估计。每个节点上的CNN都被设计成比最近工作中看到的参数更少,但它们沿着树中路径的堆叠处理有效地相当于一个更深的CNN。NRF允许对所有“浅”cnn进行并行训练,并有效地加强深度估计结果的平滑性。我们对基准Make3D和NYUv2数据集的评估表明,NRF优于最先进的状态,并且优雅地处理逐渐减少的训练数据集。
{"title":"Monocular Depth Estimation Using Neural Regression Forest","authors":"Anirban Roy, S. Todorovic","doi":"10.1109/CVPR.2016.594","DOIUrl":"https://doi.org/10.1109/CVPR.2016.594","url":null,"abstract":"This paper presents a novel deep architecture, called neural regression forest (NRF), for depth estimation from a single image. NRF combines random forests and convolutional neural networks (CNNs). Scanning windows extracted from the image represent samples which are passed down the trees of NRF for predicting their depth. At every tree node, the sample is filtered with a CNN associated with that node. Results of the convolutional filtering are passed to left and right children nodes, i.e., corresponding CNNs, with a Bernoulli probability, until the leaves, where depth estimations are made. CNNs at every node are designed to have fewer parameters than seen in recent work, but their stacked processing along a path in the tree effectively amounts to a deeper CNN. NRF allows for parallelizable training of all \"shallow\" CNNs, and efficient enforcing of smoothness in depth estimation results. Our evaluation on the benchmark Make3D and NYUv2 datasets demonstrates that NRF outperforms the state of the art, and gracefully handles gradually decreasing training datasets.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"9 1","pages":"5506-5514"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90184610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 292
Canny Text Detector: Fast and Robust Scene Text Localization Algorithm Canny文本检测器:快速鲁棒的场景文本定位算法
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.388
Hojin Cho, Myung-Chul Sung, Bongjin Jun
This paper presents a novel scene text detection algorithm, Canny Text Detector, which takes advantage of the similarity between image edge and text for effective text localization with improved recall rate. As closely related edge pixels construct the structural information of an object, we observe that cohesive characters compose a meaningful word/sentence sharing similar properties such as spatial location, size, color, and stroke width regardless of language. However, prevalent scene text detection approaches have not fully utilized such similarity, but mostly rely on the characters classified with high confidence, leading to low recall rate. By exploiting the similarity, our approach can quickly and robustly localize a variety of texts. Inspired by the original Canny edge detector, our algorithm makes use of double threshold and hysteresis tracking to detect texts of low confidence. Experimental results on public datasets demonstrate that our algorithm outperforms the state-of the-art scene text detection methods in terms of detection rate.
本文提出了一种新的场景文本检测算法——Canny文本检测算法,该算法利用图像边缘和文本之间的相似性进行有效的文本定位,提高了召回率。由于紧密相关的边缘像素构建了对象的结构信息,我们观察到内聚字符组成有意义的单词/句子,无论使用何种语言,它们都具有相似的属性,如空间位置、大小、颜色和笔画宽度。然而,目前流行的场景文本检测方法并没有充分利用这种相似性,而是大多依赖于高置信度分类的字符,导致召回率很低。通过利用相似度,我们的方法可以快速、稳健地定位各种文本。该算法受Canny边缘检测器的启发,利用双阈值和迟滞跟踪来检测低置信度的文本。在公共数据集上的实验结果表明,我们的算法在检测率方面优于目前最先进的场景文本检测方法。
{"title":"Canny Text Detector: Fast and Robust Scene Text Localization Algorithm","authors":"Hojin Cho, Myung-Chul Sung, Bongjin Jun","doi":"10.1109/CVPR.2016.388","DOIUrl":"https://doi.org/10.1109/CVPR.2016.388","url":null,"abstract":"This paper presents a novel scene text detection algorithm, Canny Text Detector, which takes advantage of the similarity between image edge and text for effective text localization with improved recall rate. As closely related edge pixels construct the structural information of an object, we observe that cohesive characters compose a meaningful word/sentence sharing similar properties such as spatial location, size, color, and stroke width regardless of language. However, prevalent scene text detection approaches have not fully utilized such similarity, but mostly rely on the characters classified with high confidence, leading to low recall rate. By exploiting the similarity, our approach can quickly and robustly localize a variety of texts. Inspired by the original Canny edge detector, our algorithm makes use of double threshold and hysteresis tracking to detect texts of low confidence. Experimental results on public datasets demonstrate that our algorithm outperforms the state-of the-art scene text detection methods in terms of detection rate.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"441 1","pages":"3566-3573"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87562459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 104
Object Co-segmentation via Graph Optimized-Flexible Manifold Ranking 基于图优化-柔性流形排序的目标共分割
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.81
Rong Quan, Junwei Han, Dingwen Zhang, F. Nie
Aiming at automatically discovering the common objects contained in a set of relevant images and segmenting them as foreground simultaneously, object co-segmentation has become an active research topic in recent years. Although a number of approaches have been proposed to address this problem, many of them are designed with the misleading assumption, unscalable prior, or low flexibility and thus still suffer from certain limitations, which reduces their capability in the real-world scenarios. To alleviate these limitations, we propose a novel two-stage co-segmentation framework, which introduces the weak background prior to establish a globally close-loop graph to represent the common object and union background separately. Then a novel graph optimized-flexible manifold ranking algorithm is proposed to flexibly optimize the graph connection and node labels to co-segment the common objects. Experiments on three image datasets demonstrate that our method outperforms other state-of-the-art methods.
针对自动发现一组相关图像中包含的共同目标,并将其作为前景进行分割,是近年来研究的热点。尽管已经提出了许多方法来解决这个问题,但其中许多方法的设计都带有误导性的假设,不可扩展的先验或低灵活性,因此仍然受到某些限制,这降低了它们在现实场景中的能力。为了缓解这些局限性,我们提出了一种新的两阶段共分割框架,该框架先引入弱背景,然后建立全局闭环图,分别表示共同目标和联合背景。在此基础上,提出了一种新的图优化柔性流形排序算法,对图的连接和节点标签进行了灵活的优化,实现了共同目标的共分割。在三个图像数据集上的实验表明,我们的方法优于其他最先进的方法。
{"title":"Object Co-segmentation via Graph Optimized-Flexible Manifold Ranking","authors":"Rong Quan, Junwei Han, Dingwen Zhang, F. Nie","doi":"10.1109/CVPR.2016.81","DOIUrl":"https://doi.org/10.1109/CVPR.2016.81","url":null,"abstract":"Aiming at automatically discovering the common objects contained in a set of relevant images and segmenting them as foreground simultaneously, object co-segmentation has become an active research topic in recent years. Although a number of approaches have been proposed to address this problem, many of them are designed with the misleading assumption, unscalable prior, or low flexibility and thus still suffer from certain limitations, which reduces their capability in the real-world scenarios. To alleviate these limitations, we propose a novel two-stage co-segmentation framework, which introduces the weak background prior to establish a globally close-loop graph to represent the common object and union background separately. Then a novel graph optimized-flexible manifold ranking algorithm is proposed to flexibly optimize the graph connection and node labels to co-segment the common objects. Experiments on three image datasets demonstrate that our method outperforms other state-of-the-art methods.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"59 1","pages":"687-695"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84865108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Structural Correlation Filter for Robust Visual Tracking 鲁棒视觉跟踪的结构相关滤波器
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.467
Si Liu, Tianzhu Zhang, Xiaochun Cao, Changsheng Xu
In this paper, we propose a novel structural correlation filter (SCF) model for robust visual tracking. The proposed SCF model takes part-based tracking strategies into account in a correlation filter tracker, and exploits circular shifts of all parts for their motion modeling to preserve target object structure. Compared with existing correlation filter trackers, our proposed tracker has several advantages: (1) Due to the part strategy, the learned structural correlation filters are less sensitive to partial occlusion, and have computational efficiency and robustness. (2) The learned filters are able to not only distinguish the parts from the background as the traditional correlation filters, but also exploit the intrinsic relationship among local parts via spatial constraints to preserve object structure. (3) The learned correlation filters not only make most parts share similar motion, but also tolerate outlier parts that have different motion. Both qualitative and quantitative evaluations on challenging benchmark image sequences demonstrate that the proposed SCF tracking algorithm performs favorably against several state-of-the-art methods.
在本文中,我们提出了一种新的结构相关滤波器(SCF)模型用于鲁棒视觉跟踪。该模型在相关滤波跟踪器中考虑了基于部件的跟踪策略,并利用各部件的循环移位进行运动建模,以保持目标物体的结构。与现有的相关滤波器跟踪器相比,本文提出的跟踪器具有以下优点:(1)由于采用局部策略,学习到的结构相关滤波器对局部遮挡的敏感性较低,具有计算效率和鲁棒性。(2)学习后的滤波器既能像传统的相关滤波器那样将局部与背景区分开来,又能通过空间约束利用局部之间的内在联系来保持目标结构。(3)学习到的相关滤波器不仅使大多数部分具有相似的运动,而且能够容忍具有不同运动的异常部分。对具有挑战性的基准图像序列的定性和定量评估表明,所提出的SCF跟踪算法与几种最先进的方法相比表现良好。
{"title":"Structural Correlation Filter for Robust Visual Tracking","authors":"Si Liu, Tianzhu Zhang, Xiaochun Cao, Changsheng Xu","doi":"10.1109/CVPR.2016.467","DOIUrl":"https://doi.org/10.1109/CVPR.2016.467","url":null,"abstract":"In this paper, we propose a novel structural correlation filter (SCF) model for robust visual tracking. The proposed SCF model takes part-based tracking strategies into account in a correlation filter tracker, and exploits circular shifts of all parts for their motion modeling to preserve target object structure. Compared with existing correlation filter trackers, our proposed tracker has several advantages: (1) Due to the part strategy, the learned structural correlation filters are less sensitive to partial occlusion, and have computational efficiency and robustness. (2) The learned filters are able to not only distinguish the parts from the background as the traditional correlation filters, but also exploit the intrinsic relationship among local parts via spatial constraints to preserve object structure. (3) The learned correlation filters not only make most parts share similar motion, but also tolerate outlier parts that have different motion. Both qualitative and quantitative evaluations on challenging benchmark image sequences demonstrate that the proposed SCF tracking algorithm performs favorably against several state-of-the-art methods.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"535 ","pages":"4312-4320"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91450065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 165
期刊
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1