首页 > 最新文献

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Semantically Coherent Co-Segmentation and Reconstruction of Dynamic Scenes 动态场景的语义连贯共分割与重构
Pub Date : 2017-11-09 DOI: 10.1109/CVPR.2017.592
A. Mustafa, A. Hilton
In this paper we propose a framework for spatially and temporally coherent semantic co-segmentation and reconstruction of complex dynamic scenes from multiple static or moving cameras. Semantic co-segmentation exploits the coherence in semantic class labels both spatially, between views at a single time instant, and temporally, between widely spaced time instants of dynamic objects with similar shape and appearance. We demonstrate that semantic coherence results in improved segmentation and reconstruction for complex scenes. A joint formulation is proposed for semantically coherent object-based co-segmentation and reconstruction of scenes by enforcing consistent semantic labelling between views and over time. Semantic tracklets are introduced to enforce temporal coherence in semantic labelling and reconstruction between widely spaced instances of dynamic objects. Tracklets of dynamic objects enable unsupervised learning of appearance and shape priors that are exploited in joint segmentation and reconstruction. Evaluation on challenging indoor and outdoor sequences with hand-held moving cameras shows improved accuracy in segmentation, temporally coherent semantic labelling and 3D reconstruction of dynamic scenes.
在本文中,我们提出了一个框架,用于空间和时间上连贯的语义共分割和重建从多个静态或移动摄像机复杂的动态场景。语义共分割利用了语义类标签在空间上的一致性,即在单个时间瞬间的视图之间,以及在时间上,具有相似形状和外观的动态对象的大间隔时间瞬间之间。我们证明了语义连贯可以改善复杂场景的分割和重建。通过在视图和时间之间强制一致的语义标记,提出了一种基于语义连贯对象的场景共分割和重建的联合公式。引入语义轨迹是为了在广泛间隔的动态对象实例之间进行语义标记和重建时加强时间一致性。动态对象的轨迹可以实现外观和形状先验的无监督学习,用于联合分割和重建。用手持移动摄像机对具有挑战性的室内和室外序列进行评估,结果表明,在分割、时间连贯的语义标记和动态场景的3D重建方面,精度有所提高。
{"title":"Semantically Coherent Co-Segmentation and Reconstruction of Dynamic Scenes","authors":"A. Mustafa, A. Hilton","doi":"10.1109/CVPR.2017.592","DOIUrl":"https://doi.org/10.1109/CVPR.2017.592","url":null,"abstract":"In this paper we propose a framework for spatially and temporally coherent semantic co-segmentation and reconstruction of complex dynamic scenes from multiple static or moving cameras. Semantic co-segmentation exploits the coherence in semantic class labels both spatially, between views at a single time instant, and temporally, between widely spaced time instants of dynamic objects with similar shape and appearance. We demonstrate that semantic coherence results in improved segmentation and reconstruction for complex scenes. A joint formulation is proposed for semantically coherent object-based co-segmentation and reconstruction of scenes by enforcing consistent semantic labelling between views and over time. Semantic tracklets are introduced to enforce temporal coherence in semantic labelling and reconstruction between widely spaced instances of dynamic objects. Tracklets of dynamic objects enable unsupervised learning of appearance and shape priors that are exploited in joint segmentation and reconstruction. Evaluation on challenging indoor and outdoor sequences with hand-held moving cameras shows improved accuracy in segmentation, temporally coherent semantic labelling and 3D reconstruction of dynamic scenes.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"27 1","pages":"5583-5592"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83062724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
FFTLasso: Large-Scale LASSO in the Fourier Domain FFTLasso:傅里叶域的大规模LASSO
Pub Date : 2017-11-09 DOI: 10.1109/CVPR.2017.465
Adel Bibi, Hani Itani, Bernard Ghanem
In this paper, we revisit the LASSO sparse representation problem, which has been studied and used in a variety of different areas, ranging from signal processing and information theory to computer vision and machine learning. In the vision community, it found its way into many important applications, including face recognition, tracking, super resolution, image denoising, to name a few. Despite advances in efficient sparse algorithms, solving large-scale LASSO problems remains a challenge. To circumvent this difficulty, people tend to downsample and subsample the problem (e.g. via dimensionality reduction) to maintain a manageable sized LASSO, which usually comes at the cost of losing solution accuracy. This paper proposes a novel circulant reformulation of the LASSO that lifts the problem to a higher dimension, where ADMM can be efficiently applied to its dual form. Because of this lifting, all optimization variables are updated using only basic element-wise operations, the most computationally expensive of which is a 1D FFT. In this way, there is no need for a linear system solver nor matrix-vector multiplication. Since all operations in our FFTLasso method are element-wise, the subproblems are completely independent and can be trivially parallelized (e.g. on a GPU). The attractive computational properties of FFTLasso are verified by extensive experiments on synthetic and real data and on the face recognition task. They demonstrate that FFTLasso scales much more effectively than a state-of-the-art solver.
在本文中,我们重新审视LASSO稀疏表示问题,该问题已被研究并应用于各种不同的领域,从信号处理和信息论到计算机视觉和机器学习。在视觉领域,它进入了许多重要的应用,包括人脸识别、跟踪、超分辨率、图像去噪等等。尽管高效的稀疏算法取得了进步,但解决大规模LASSO问题仍然是一个挑战。为了规避这个困难,人们倾向于对问题进行下采样和子采样(例如,通过降维),以保持可管理的LASSO大小,这通常是以失去解决方案准确性为代价的。本文提出了一种新的循环重构LASSO,将问题提升到一个更高的维度,其中ADMM可以有效地应用于其对偶形式。由于这种提升,所有优化变量都只使用基本的元素操作进行更新,其中计算成本最高的是1D FFT。这样,就不需要线性系统求解器,也不需要矩阵向量乘法。由于我们的FFTLasso方法中的所有操作都是基于元素的,所以子问题是完全独立的,并且可以简单地并行化(例如在GPU上)。在合成数据和真实数据以及人脸识别任务上进行了大量的实验,验证了FFTLasso的计算性能。他们证明FFTLasso比最先进的求解器更有效。
{"title":"FFTLasso: Large-Scale LASSO in the Fourier Domain","authors":"Adel Bibi, Hani Itani, Bernard Ghanem","doi":"10.1109/CVPR.2017.465","DOIUrl":"https://doi.org/10.1109/CVPR.2017.465","url":null,"abstract":"In this paper, we revisit the LASSO sparse representation problem, which has been studied and used in a variety of different areas, ranging from signal processing and information theory to computer vision and machine learning. In the vision community, it found its way into many important applications, including face recognition, tracking, super resolution, image denoising, to name a few. Despite advances in efficient sparse algorithms, solving large-scale LASSO problems remains a challenge. To circumvent this difficulty, people tend to downsample and subsample the problem (e.g. via dimensionality reduction) to maintain a manageable sized LASSO, which usually comes at the cost of losing solution accuracy. This paper proposes a novel circulant reformulation of the LASSO that lifts the problem to a higher dimension, where ADMM can be efficiently applied to its dual form. Because of this lifting, all optimization variables are updated using only basic element-wise operations, the most computationally expensive of which is a 1D FFT. In this way, there is no need for a linear system solver nor matrix-vector multiplication. Since all operations in our FFTLasso method are element-wise, the subproblems are completely independent and can be trivially parallelized (e.g. on a GPU). The attractive computational properties of FFTLasso are verified by extensive experiments on synthetic and real data and on the face recognition task. They demonstrate that FFTLasso scales much more effectively than a state-of-the-art solver.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"4371-4380"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74128107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Coarse-to-Fine Segmentation with Shape-Tailored Continuum Scale Spaces 形状定制连续尺度空间的粗到精分割
Pub Date : 2017-11-09 DOI: 10.1109/CVPR.2017.188
Naeemullah Khan, Byung-Woo Hong, A. Yezzi, G. Sundaramoorthi
We formulate an energy for segmentation that is designed to have preference for segmenting the coarse over fine structure of the image, without smoothing across boundaries of regions. The energy is formulated by integrating a continuum of scales from a scale space computed from the heat equation within regions. We show that the energy can be optimized without computing a continuum of scales, but instead from a single scale. This makes the method computationally efficient in comparison to energies using a discrete set of scales. We apply our method to texture and motion segmentation. Experiments on benchmark datasets show that a continuum of scales leads to better segmentation accuracy over discrete scales and other competing methods.
我们制定了分割的能量,该能量被设计为优先分割图像的粗结构而不是精细结构,而不平滑跨越区域边界。能量是通过积分从区域内的热方程计算的尺度空间的尺度连续体来表示的。我们表明,能量可以在不计算连续尺度的情况下进行优化,而是从单一尺度开始。这使得该方法与使用一组离散尺度的能量相比计算效率更高。我们将该方法应用于纹理和运动分割。在基准数据集上的实验表明,连续尺度比离散尺度和其他竞争方法具有更好的分割精度。
{"title":"Coarse-to-Fine Segmentation with Shape-Tailored Continuum Scale Spaces","authors":"Naeemullah Khan, Byung-Woo Hong, A. Yezzi, G. Sundaramoorthi","doi":"10.1109/CVPR.2017.188","DOIUrl":"https://doi.org/10.1109/CVPR.2017.188","url":null,"abstract":"We formulate an energy for segmentation that is designed to have preference for segmenting the coarse over fine structure of the image, without smoothing across boundaries of regions. The energy is formulated by integrating a continuum of scales from a scale space computed from the heat equation within regions. We show that the energy can be optimized without computing a continuum of scales, but instead from a single scale. This makes the method computationally efficient in comparison to energies using a discrete set of scales. We apply our method to texture and motion segmentation. Experiments on benchmark datasets show that a continuum of scales leads to better segmentation accuracy over discrete scales and other competing methods.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"60 1","pages":"1733-1742"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83737909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Multi-way Multi-level Kernel Modeling for Neuroimaging Classification 神经成像分类的多路多级核建模
Pub Date : 2017-11-06 DOI: 10.1109/CVPR.2017.724
Lifang He, Chun-Ta Lu, Hao Ding, Shen Wang, L. Shen, Philip S. Yu, A. Ragin
Owing to prominence as a diagnostic tool for probing the neural correlates of cognition, neuroimaging tensor data has been the focus of intense investigation. Although many supervised tensor learning approaches have been proposed, they either cannot capture the nonlinear relationships of tensor data or cannot preserve the complex multi-way structural information. In this paper, we propose a Multi-way Multi-level Kernel (MMK) model that can extract discriminative, nonlinear and structural preserving representations of tensor data. Specifically, we introduce a kernelized CP tensor factorization technique, which is equivalent to performing the low-rank tensor factorization in a possibly much higher dimensional space that is implicitly defined by the kernel function. We further employ a multi-way nonlinear feature mapping to derive the dual structural preserving kernels, which are used in conjunction with kernel machines (e.g., SVM). Extensive experiments on real-world neuroimages demonstrate that the proposed MMK method can effectively boost the classification performance on diverse brain disorders (i.e., Alzheimers disease, ADHD, and HIV).
由于神经成像张量数据作为探测认知神经相关的诊断工具的突出地位,一直是激烈研究的焦点。尽管已有许多监督张量学习方法被提出,但它们要么不能捕捉张量数据的非线性关系,要么不能保留复杂的多路结构信息。本文提出了一种多路多级核(MMK)模型,该模型可以提取张量数据的判别、非线性和结构保持表示。具体来说,我们引入了一种核化CP张量分解技术,它相当于在一个可能由核函数隐式定义的高维空间中执行低秩张量分解。我们进一步采用多路非线性特征映射来导出与核机(例如SVM)结合使用的对偶结构保持核。在真实世界的神经图像上进行的大量实验表明,所提出的MMK方法可以有效地提高对多种脑部疾病(如阿尔茨海默病、多动症和艾滋病毒)的分类性能。
{"title":"Multi-way Multi-level Kernel Modeling for Neuroimaging Classification","authors":"Lifang He, Chun-Ta Lu, Hao Ding, Shen Wang, L. Shen, Philip S. Yu, A. Ragin","doi":"10.1109/CVPR.2017.724","DOIUrl":"https://doi.org/10.1109/CVPR.2017.724","url":null,"abstract":"Owing to prominence as a diagnostic tool for probing the neural correlates of cognition, neuroimaging tensor data has been the focus of intense investigation. Although many supervised tensor learning approaches have been proposed, they either cannot capture the nonlinear relationships of tensor data or cannot preserve the complex multi-way structural information. In this paper, we propose a Multi-way Multi-level Kernel (MMK) model that can extract discriminative, nonlinear and structural preserving representations of tensor data. Specifically, we introduce a kernelized CP tensor factorization technique, which is equivalent to performing the low-rank tensor factorization in a possibly much higher dimensional space that is implicitly defined by the kernel function. We further employ a multi-way nonlinear feature mapping to derive the dual structural preserving kernels, which are used in conjunction with kernel machines (e.g., SVM). Extensive experiments on real-world neuroimages demonstrate that the proposed MMK method can effectively boost the classification performance on diverse brain disorders (i.e., Alzheimers disease, ADHD, and HIV).","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"8 1","pages":"6846-6854"},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79881631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Joint Gap Detection and Inpainting of Line Drawings 接缝缝隙检测与线形图补漆
Pub Date : 2017-11-06 DOI: 10.1109/CVPR.2017.611
Kazuma Sasaki, S. Iizuka, E. Simo-Serra, H. Ishikawa
We propose a novel data-driven approach for automatically detecting and completing gaps in line drawings with a Convolutional Neural Network. In the case of existing inpainting approaches for natural images, masks indicating the missing regions are generally required as input. Here, we show that line drawings have enough structures that can be learned by the CNN to allow automatic detection and completion of the gaps without any such input. Thus, our method can find the gaps in line drawings and complete them without user interaction. Furthermore, the completion realistically conserves thickness and curvature of the line segments. All the necessary heuristics for such realistic line completion are learned naturally from a dataset of line drawings, where various patterns of line completion are generated on the fly as training pairs to improve the model generalization. We evaluate our method qualitatively on a diverse set of challenging line drawings and also provide quantitative results with a user study, where it significantly outperforms the state of the art.
我们提出了一种新颖的数据驱动方法,用于使用卷积神经网络自动检测和完成线条图中的间隙。在现有的自然图像补图方法中,通常需要用蒙版表示缺失区域作为输入。在这里,我们展示了线条图有足够的结构,可以被CNN学习,允许在没有任何输入的情况下自动检测和完成间隙。因此,我们的方法可以在没有用户交互的情况下找到线条图中的空白并完成它们。此外,该补全实际地保留了线段的厚度和曲率。这种真实的线条补全的所有必要的启发式都是从线条图的数据集中自然地学习到的,其中各种线条补全的模式作为训练对实时生成,以提高模型的泛化。我们在一系列具有挑战性的线条图上对我们的方法进行了定性评估,并通过用户研究提供了定量结果,其中它明显优于最先进的技术。
{"title":"Joint Gap Detection and Inpainting of Line Drawings","authors":"Kazuma Sasaki, S. Iizuka, E. Simo-Serra, H. Ishikawa","doi":"10.1109/CVPR.2017.611","DOIUrl":"https://doi.org/10.1109/CVPR.2017.611","url":null,"abstract":"We propose a novel data-driven approach for automatically detecting and completing gaps in line drawings with a Convolutional Neural Network. In the case of existing inpainting approaches for natural images, masks indicating the missing regions are generally required as input. Here, we show that line drawings have enough structures that can be learned by the CNN to allow automatic detection and completion of the gaps without any such input. Thus, our method can find the gaps in line drawings and complete them without user interaction. Furthermore, the completion realistically conserves thickness and curvature of the line segments. All the necessary heuristics for such realistic line completion are learned naturally from a dataset of line drawings, where various patterns of line completion are generated on the fly as training pairs to improve the model generalization. We evaluate our method qualitatively on a diverse set of challenging line drawings and also provide quantitative results with a user study, where it significantly outperforms the state of the art.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"113 1","pages":"5768-5776"},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79315804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Wetness and Color from a Single Multispectral Image 单幅多光谱图像的湿度和颜色
Pub Date : 2017-11-06 DOI: 10.1109/CVPR.2017.42
Mihoko Shimano, Hiroki Okawa, Yuta Asano, Ryoma Bise, K. Nishino, Imari Sato
Visual recognition of wet surfaces and their degrees of wetness is important for many computer vision applications. It can inform slippery spots on a road to autonomous vehicles, muddy areas of a trail to humanoid robots, and the freshness of groceries to us. In the past, monochromatic appearance change, the fact that surfaces darken when wet, has been modeled to recognize wet surfaces. In this paper, we show that color change, particularly in its spectral behavior, carries rich information about a wet surface. We derive an analytical spectral appearance model of wet surfaces that expresses the characteristic spectral sharpening due to multiple scattering and absorption in the surface. We derive a novel method for estimating key parameters of this spectral appearance model, which enables the recovery of the original surface color and the degree of wetness from a single observation. Applied to a multispectral image, the method estimates the spatial map of wetness together with the dry spectral distribution of the surface. To our knowledge, this work is the first to model and leverage the spectral characteristics of wet surfaces to revert its appearance. We conduct comprehensive experimental validation with a number of wet real surfaces. The results demonstrate the accuracy of our model and the effectiveness of our method for surface wetness and color estimation.
湿润表面及其湿润程度的视觉识别对于许多计算机视觉应用非常重要。它可以让自动驾驶汽车知道道路上的湿滑地点,让人形机器人知道道路上的泥泞区域,让我们知道食品杂货的新鲜程度。在过去,单色的外观变化,即表面在潮湿时变暗的事实,已经被建模来识别潮湿的表面。在本文中,我们证明了颜色的变化,特别是其光谱行为,携带了关于潮湿表面的丰富信息。我们推导了湿表面的分析光谱外观模型,该模型表达了由于表面多次散射和吸收而导致的特征光谱锐化。我们提出了一种估算该光谱外观模型关键参数的新方法,该方法可以从单次观测中恢复原始表面颜色和湿润程度。将该方法应用于多光谱图像,估算出地表湿度和干光谱的空间分布。据我们所知,这项工作是第一次模拟和利用湿表面的光谱特征来恢复其外观。我们用许多潮湿的真实表面进行了全面的实验验证。结果证明了该模型的准确性以及该方法对表面湿度和颜色估计的有效性。
{"title":"Wetness and Color from a Single Multispectral Image","authors":"Mihoko Shimano, Hiroki Okawa, Yuta Asano, Ryoma Bise, K. Nishino, Imari Sato","doi":"10.1109/CVPR.2017.42","DOIUrl":"https://doi.org/10.1109/CVPR.2017.42","url":null,"abstract":"Visual recognition of wet surfaces and their degrees of wetness is important for many computer vision applications. It can inform slippery spots on a road to autonomous vehicles, muddy areas of a trail to humanoid robots, and the freshness of groceries to us. In the past, monochromatic appearance change, the fact that surfaces darken when wet, has been modeled to recognize wet surfaces. In this paper, we show that color change, particularly in its spectral behavior, carries rich information about a wet surface. We derive an analytical spectral appearance model of wet surfaces that expresses the characteristic spectral sharpening due to multiple scattering and absorption in the surface. We derive a novel method for estimating key parameters of this spectral appearance model, which enables the recovery of the original surface color and the degree of wetness from a single observation. Applied to a multispectral image, the method estimates the spatial map of wetness together with the dry spectral distribution of the surface. To our knowledge, this work is the first to model and leverage the spectral characteristics of wet surfaces to revert its appearance. We conduct comprehensive experimental validation with a number of wet real surfaces. The results demonstrate the accuracy of our model and the effectiveness of our method for surface wetness and color estimation.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"195 1","pages":"321-329"},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79813625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Learning Deep Context-Aware Features over Body and Latent Parts for Person Re-identification 学习身体和潜在部位的深度上下文感知特征,用于人的再识别
Pub Date : 2017-10-18 DOI: 10.1109/CVPR.2017.782
Dangwei Li, Xiaotang Chen, Z. Zhang, Kaiqi Huang
Person Re-identification (ReID) is to identify the same person across different cameras. It is a challenging task due to the large variations in person pose, occlusion, background clutter, etc. How to extract powerful features is a fundamental problem in ReID and is still an open problem today. In this paper, we design a Multi-Scale Context-Aware Network (MSCAN) to learn powerful features over full body and body parts, which can well capture the local context knowledge by stacking multi-scale convolutions in each layer. Moreover, instead of using predefined rigid parts, we propose to learn and localize deformable pedestrian parts using Spatial Transformer Networks (STN) with novel spatial constraints. The learned body parts can release some difficulties, e.g. pose variations and background clutters, in part-based representation. Finally, we integrate the representation learning processes of full body and body parts into a unified framework for person ReID through multi-class person identification tasks. Extensive evaluations on current challenging large-scale person ReID datasets, including the image-based Market1501, CUHK03 and sequence-based MARS datasets, show that the proposed method achieves the state-of-the-art results.
人员重新识别(ReID)是指在不同的摄像机中识别同一个人。这是一项具有挑战性的任务,因为人的姿势、遮挡、背景杂乱等都有很大的变化。如何提取强大的特征是ReID的一个基本问题,今天仍然是一个开放的问题。在本文中,我们设计了一个多尺度上下文感知网络(MSCAN)来学习全身和身体部位的强大特征,通过在每一层叠加多尺度卷积,可以很好地捕获局部上下文知识。此外,我们建议使用具有新空间约束的空间变压器网络(STN)来学习和定位可变形的行人部件,而不是使用预定义的刚性部件。在基于部位的表征中,学习到的身体部位可以缓解姿势变化和背景混乱等困难。最后,通过多类别的人物识别任务,将全身和身体部位的表征学习过程整合到统一的人物识别框架中。对当前具有挑战性的大规模人体ReID数据集(包括基于图像的Market1501、CUHK03和基于序列的MARS数据集)的广泛评估表明,所提出的方法达到了最先进的结果。
{"title":"Learning Deep Context-Aware Features over Body and Latent Parts for Person Re-identification","authors":"Dangwei Li, Xiaotang Chen, Z. Zhang, Kaiqi Huang","doi":"10.1109/CVPR.2017.782","DOIUrl":"https://doi.org/10.1109/CVPR.2017.782","url":null,"abstract":"Person Re-identification (ReID) is to identify the same person across different cameras. It is a challenging task due to the large variations in person pose, occlusion, background clutter, etc. How to extract powerful features is a fundamental problem in ReID and is still an open problem today. In this paper, we design a Multi-Scale Context-Aware Network (MSCAN) to learn powerful features over full body and body parts, which can well capture the local context knowledge by stacking multi-scale convolutions in each layer. Moreover, instead of using predefined rigid parts, we propose to learn and localize deformable pedestrian parts using Spatial Transformer Networks (STN) with novel spatial constraints. The learned body parts can release some difficulties, e.g. pose variations and background clutters, in part-based representation. Finally, we integrate the representation learning processes of full body and body parts into a unified framework for person ReID through multi-class person identification tasks. Extensive evaluations on current challenging large-scale person ReID datasets, including the image-based Market1501, CUHK03 and sequence-based MARS datasets, show that the proposed method achieves the state-of-the-art results.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"116 1","pages":"7398-7407"},"PeriodicalIF":0.0,"publicationDate":"2017-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80351792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 621
A Low Power, Fully Event-Based Gesture Recognition System 一个低功耗,完全基于事件的手势识别系统
Pub Date : 2017-07-25 DOI: 10.1109/CVPR.2017.781
A. Amir, B. Taba, David J. Berg, T. Melano, J. McKinstry, C. D. Nolfo, T. Nayak, Alexander Andreopoulos, Guillaume J. Garreau, Marcela Mendoza, J. Kusnitz, M. DeBole, Steven K. Esser, T. Delbrück, M. Flickner, D. Modha
We present the first gesture recognition system implemented end-to-end on event-based hardware, using a TrueNorth neurosynaptic processor to recognize hand gestures in real-time at low power from events streamed live by a Dynamic Vision Sensor (DVS). The biologically inspired DVS transmits data only when a pixel detects a change, unlike traditional frame-based cameras which sample every pixel at a fixed frame rate. This sparse, asynchronous data representation lets event-based cameras operate at much lower power than frame-based cameras. However, much of the energy efficiency is lost if, as in previous work, the event stream is interpreted by conventional synchronous processors. Here, for the first time, we process a live DVS event stream using TrueNorth, a natively event-based processor with 1 million spiking neurons. Configured here as a convolutional neural network (CNN), the TrueNorth chip identifies the onset of a gesture with a latency of 105 ms while consuming less than 200 mW. The CNN achieves 96.5% out-of-sample accuracy on a newly collected DVS dataset (DvsGesture) comprising 11 hand gesture categories from 29 subjects under 3 illumination conditions.
我们提出了第一个在基于事件的硬件上实现端到端手势识别系统,使用TrueNorth神经突触处理器从动态视觉传感器(DVS)实时流媒体事件中以低功耗实时识别手势。受生物启发的分布式交换机仅在像素检测到变化时传输数据,而传统的基于帧的相机以固定帧速率对每个像素进行采样。这种稀疏的、异步的数据表示使得基于事件的相机比基于帧的相机功耗低得多。然而,如果像前面的工作一样,事件流是由传统的同步处理器解释的,那么大部分的能源效率就会损失。在这里,我们第一次使用TrueNorth处理实时分布式交换机事件流,TrueNorth是一个具有100万个尖峰神经元的本地基于事件的处理器。TrueNorth芯片在这里配置为卷积神经网络(CNN),可以识别一个手势的开始,延迟为105毫秒,功耗低于200兆瓦。CNN在新收集的分布式交换机数据集(DvsGesture)上实现了96.5%的样本外准确率,该数据集包括29个受试者在3种照明条件下的11个手势类别。
{"title":"A Low Power, Fully Event-Based Gesture Recognition System","authors":"A. Amir, B. Taba, David J. Berg, T. Melano, J. McKinstry, C. D. Nolfo, T. Nayak, Alexander Andreopoulos, Guillaume J. Garreau, Marcela Mendoza, J. Kusnitz, M. DeBole, Steven K. Esser, T. Delbrück, M. Flickner, D. Modha","doi":"10.1109/CVPR.2017.781","DOIUrl":"https://doi.org/10.1109/CVPR.2017.781","url":null,"abstract":"We present the first gesture recognition system implemented end-to-end on event-based hardware, using a TrueNorth neurosynaptic processor to recognize hand gestures in real-time at low power from events streamed live by a Dynamic Vision Sensor (DVS). The biologically inspired DVS transmits data only when a pixel detects a change, unlike traditional frame-based cameras which sample every pixel at a fixed frame rate. This sparse, asynchronous data representation lets event-based cameras operate at much lower power than frame-based cameras. However, much of the energy efficiency is lost if, as in previous work, the event stream is interpreted by conventional synchronous processors. Here, for the first time, we process a live DVS event stream using TrueNorth, a natively event-based processor with 1 million spiking neurons. Configured here as a convolutional neural network (CNN), the TrueNorth chip identifies the onset of a gesture with a latency of 105 ms while consuming less than 200 mW. The CNN achieves 96.5% out-of-sample accuracy on a newly collected DVS dataset (DvsGesture) comprising 11 hand gesture categories from 29 subjects under 3 illumination conditions.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"120 1","pages":"7388-7397"},"PeriodicalIF":0.0,"publicationDate":"2017-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87774856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 505
Temporal Residual Networks for Dynamic Scene Recognition 动态场景识别的时间残差网络
Pub Date : 2017-07-22 DOI: 10.1109/CVPR.2017.786
Christoph Feichtenhofer, A. Pinz, Richard P. Wildes
This paper combines three contributions to establish a new state-of-the-art in dynamic scene recognition. First, we present a novel ConvNet architecture based on temporal residual units that is fully convolutional in spacetime. Our model augments spatial ResNets with convolutions across time to hierarchically add temporal residuals as the depth of the network increases. Second, existing approaches to video-based recognition are categorized and a baseline of seven previously top performing algorithms is selected for comparative evaluation on dynamic scenes. Third, we introduce a new and challenging video database of dynamic scenes that more than doubles the size of those previously available. This dataset is explicitly split into two subsets of equal size that contain videos with and without camera motion to allow for systematic study of how this variable interacts with the defining dynamics of the scene per se. Our evaluations verify the particular strengths and weaknesses of the baseline algorithms with respect to various scene classes and camera motion parameters. Finally, our temporal ResNet boosts recognition performance and establishes a new state-of-the-art on dynamic scene recognition, as well as on the complementary task of action recognition.
本文结合三个方面的贡献,建立了动态场景识别的新技术。首先,我们提出了一种新的基于时间残差单元的卷积神经网络结构,该结构在时空中是完全卷积的。我们的模型通过时间上的卷积来增加空间ResNets,随着网络深度的增加,分层地添加时间残差。其次,对现有的基于视频的识别方法进行分类,并选择七个先前表现最好的算法作为基线,对动态场景进行比较评估。第三,我们引入了一个新的、具有挑战性的动态场景视频数据库,它的大小是以前可用的视频数据库的两倍以上。这个数据集被明确地分成两个大小相等的子集,其中包含有和没有摄像机运动的视频,以便系统地研究这个变量如何与场景本身的定义动态相互作用。我们的评估验证了相对于各种场景类和相机运动参数的基线算法的特定优点和缺点。最后,我们的时间ResNet提高了识别性能,并在动态场景识别以及动作识别的互补任务上建立了新的技术水平。
{"title":"Temporal Residual Networks for Dynamic Scene Recognition","authors":"Christoph Feichtenhofer, A. Pinz, Richard P. Wildes","doi":"10.1109/CVPR.2017.786","DOIUrl":"https://doi.org/10.1109/CVPR.2017.786","url":null,"abstract":"This paper combines three contributions to establish a new state-of-the-art in dynamic scene recognition. First, we present a novel ConvNet architecture based on temporal residual units that is fully convolutional in spacetime. Our model augments spatial ResNets with convolutions across time to hierarchically add temporal residuals as the depth of the network increases. Second, existing approaches to video-based recognition are categorized and a baseline of seven previously top performing algorithms is selected for comparative evaluation on dynamic scenes. Third, we introduce a new and challenging video database of dynamic scenes that more than doubles the size of those previously available. This dataset is explicitly split into two subsets of equal size that contain videos with and without camera motion to allow for systematic study of how this variable interacts with the defining dynamics of the scene per se. Our evaluations verify the particular strengths and weaknesses of the baseline algorithms with respect to various scene classes and camera motion parameters. Finally, our temporal ResNet boosts recognition performance and establishes a new state-of-the-art on dynamic scene recognition, as well as on the complementary task of action recognition.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"24 1","pages":"7435-7444"},"PeriodicalIF":0.0,"publicationDate":"2017-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85336856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Consistent-Aware Deep Learning for Person Re-identification in a Camera Network 摄像机网络中一致性感知深度学习的人物再识别
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.362
Ji Lin, Liangliang Ren, Jiwen Lu, Jianjiang Feng, Jie Zhou
In this paper, we propose a consistent-aware deep learning (CADL) framework for person re-identification in a camera network. Unlike most existing person re-identification methods which identify whether two body images are from the same person, our approach aims to obtain the maximal correct matches for the whole camera network. Different from recently proposed camera network based re-identification methods which only consider the consistent information in the matching stage to obtain a global optimal association, we exploit such consistent-aware information under a deep learning framework where both feature representation and image matching are automatically learned with certain consistent constraints. Specifically, we reach the global optimal solution and balance the performance between different cameras by optimizing the similarity and association iteratively. Experimental results show that our method obtains significant performance improvement and outperforms the state-of-the-art methods by large margins.
在本文中,我们提出了一个一致性感知深度学习(CADL)框架,用于摄像机网络中的人员再识别。与大多数现有的识别人体图像是否来自同一个人的方法不同,我们的方法旨在获得整个摄像机网络的最大正确匹配。与最近提出的基于相机网络的再识别方法仅考虑匹配阶段的一致性信息以获得全局最优关联不同,我们在深度学习框架下利用这种一致性感知信息,在一定的一致性约束下自动学习特征表示和图像匹配。具体而言,我们通过迭代优化相似度和关联度来达到全局最优解并平衡不同相机之间的性能。实验结果表明,我们的方法获得了显著的性能改进,并且大大优于目前最先进的方法。
{"title":"Consistent-Aware Deep Learning for Person Re-identification in a Camera Network","authors":"Ji Lin, Liangliang Ren, Jiwen Lu, Jianjiang Feng, Jie Zhou","doi":"10.1109/CVPR.2017.362","DOIUrl":"https://doi.org/10.1109/CVPR.2017.362","url":null,"abstract":"In this paper, we propose a consistent-aware deep learning (CADL) framework for person re-identification in a camera network. Unlike most existing person re-identification methods which identify whether two body images are from the same person, our approach aims to obtain the maximal correct matches for the whole camera network. Different from recently proposed camera network based re-identification methods which only consider the consistent information in the matching stage to obtain a global optimal association, we exploit such consistent-aware information under a deep learning framework where both feature representation and image matching are automatically learned with certain consistent constraints. Specifically, we reach the global optimal solution and balance the performance between different cameras by optimizing the similarity and association iteratively. Experimental results show that our method obtains significant performance improvement and outperforms the state-of-the-art methods by large margins.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"35 1","pages":"3396-3405"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72954114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 119
期刊
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1