首页 > 最新文献

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Learning the Multilinear Structure of Visual Data 学习视觉数据的多线性结构
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.641
Mengjiao MJ Wang, Yannis Panagakis, Patrick Snape, S. Zafeiriou
Statistical decomposition methods are of paramount importance in discovering the modes of variations of visual data. Probably the most prominent linear decomposition method is the Principal Component Analysis (PCA), which discovers a single mode of variation in the data. However, in practice, visual data exhibit several modes of variations. For instance, the appearance of faces varies in identity, expression, pose etc. To extract these modes of variations from visual data, several supervised methods, such as the TensorFaces, that rely on multilinear (tensor) decomposition (e.g., Higher Order SVD) have been developed. The main drawbacks of such methods is that they require both labels regarding the modes of variations and the same number of samples under all modes of variations (e.g., the same face under different expressions, poses etc.). Therefore, their applicability is limited to well-organised data, usually captured in well-controlled conditions. In this paper, we propose the first general multilinear method, to the best of our knowledge, that discovers the multilinear structure of visual data in unsupervised setting. That is, without the presence of labels. We demonstrate the applicability of the proposed method in two applications, namely Shape from Shading (SfS) and expression transfer.
统计分解方法对于发现视觉数据的变化模式至关重要。可能最突出的线性分解方法是主成分分析(PCA),它发现数据中的单一变化模式。然而,在实践中,视觉数据表现出几种变化模式。例如,面孔的外观在身份、表情、姿势等方面各不相同。为了从视觉数据中提取这些变化模式,已经开发了几种依赖于多线性(张量)分解(例如高阶SVD)的监督方法,例如TensorFaces。这种方法的主要缺点是既需要对变异模态进行标记,又需要在所有变异模态下的样本数量相同(例如,不同表情、姿势下的同一张脸等)。因此,它们的适用性仅限于组织良好的数据,通常是在良好控制的条件下捕获的。在本文中,我们提出了第一个通用的多元线性方法,据我们所知,它发现了无监督环境下视觉数据的多元线性结构。也就是说,没有标签的存在。我们证明了该方法在两个应用中的适用性,即形状从阴影(SfS)和表达转移。
{"title":"Learning the Multilinear Structure of Visual Data","authors":"Mengjiao MJ Wang, Yannis Panagakis, Patrick Snape, S. Zafeiriou","doi":"10.1109/CVPR.2017.641","DOIUrl":"https://doi.org/10.1109/CVPR.2017.641","url":null,"abstract":"Statistical decomposition methods are of paramount importance in discovering the modes of variations of visual data. Probably the most prominent linear decomposition method is the Principal Component Analysis (PCA), which discovers a single mode of variation in the data. However, in practice, visual data exhibit several modes of variations. For instance, the appearance of faces varies in identity, expression, pose etc. To extract these modes of variations from visual data, several supervised methods, such as the TensorFaces, that rely on multilinear (tensor) decomposition (e.g., Higher Order SVD) have been developed. The main drawbacks of such methods is that they require both labels regarding the modes of variations and the same number of samples under all modes of variations (e.g., the same face under different expressions, poses etc.). Therefore, their applicability is limited to well-organised data, usually captured in well-controlled conditions. In this paper, we propose the first general multilinear method, to the best of our knowledge, that discovers the multilinear structure of visual data in unsupervised setting. That is, without the presence of labels. We demonstrate the applicability of the proposed method in two applications, namely Shape from Shading (SfS) and expression transfer.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"5 1","pages":"6053-6061"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76382666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Fine-Grained Recognition as HSnet Search for Informative Image Parts 基于HSnet的信息图像部分细粒度识别
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.688
Michael Lam, Behrooz Mahasseni, S. Todorovic
This work addresses fine-grained image classification. Our work is based on the hypothesis that when dealing with subtle differences among object classes it is critical to identify and only account for a few informative image parts, as the remaining image context may not only be uninformative but may also hurt recognition. This motivates us to formulate our problem as a sequential search for informative parts over a deep feature map produced by a deep Convolutional Neural Network (CNN). A state of this search is a set of proposal bounding boxes in the image, whose informativeness is evaluated by the heuristic function (H), and used for generating new candidate states by the successor function (S). The two functions are unified via a Long Short-Term Memory network (LSTM) into a new deep recurrent architecture, called HSnet. Thus, HSnet (i) generates proposals of informative image parts and (ii) fuses all proposals toward final fine-grained recognition. We specify both supervised and weakly supervised training of HSnet depending on the availability of object part annotations. Evaluation on the benchmark Caltech-UCSD Birds 200-2011 and Cars-196 datasets demonstrate our competitive performance relative to the state of the art.
这项工作涉及细粒度图像分类。我们的工作是基于这样的假设:当处理对象类别之间的细微差异时,识别并只考虑少数信息丰富的图像部分是至关重要的,因为剩余的图像上下文不仅可能没有信息,而且可能会损害识别。这促使我们将问题表述为在深度卷积神经网络(CNN)生成的深度特征图上对信息部分的顺序搜索。该搜索的状态是图像中的一组建议边界框,其信息性由启发式函数(H)评估,并用于由后继函数(S)生成新的候选状态。这两个函数通过长短期记忆网络(LSTM)统一为一个新的深度循环架构,称为HSnet。因此,HSnet (i)生成信息图像部分的建议,(ii)将所有建议融合到最终的细粒度识别中。我们根据对象部分注释的可用性指定HSnet的监督和弱监督训练。对基准Caltech-UCSD Birds 200-2011和Cars-196数据集的评估表明,我们的表现相对于最先进的水平具有竞争力。
{"title":"Fine-Grained Recognition as HSnet Search for Informative Image Parts","authors":"Michael Lam, Behrooz Mahasseni, S. Todorovic","doi":"10.1109/CVPR.2017.688","DOIUrl":"https://doi.org/10.1109/CVPR.2017.688","url":null,"abstract":"This work addresses fine-grained image classification. Our work is based on the hypothesis that when dealing with subtle differences among object classes it is critical to identify and only account for a few informative image parts, as the remaining image context may not only be uninformative but may also hurt recognition. This motivates us to formulate our problem as a sequential search for informative parts over a deep feature map produced by a deep Convolutional Neural Network (CNN). A state of this search is a set of proposal bounding boxes in the image, whose informativeness is evaluated by the heuristic function (H), and used for generating new candidate states by the successor function (S). The two functions are unified via a Long Short-Term Memory network (LSTM) into a new deep recurrent architecture, called HSnet. Thus, HSnet (i) generates proposals of informative image parts and (ii) fuses all proposals toward final fine-grained recognition. We specify both supervised and weakly supervised training of HSnet depending on the availability of object part annotations. Evaluation on the benchmark Caltech-UCSD Birds 200-2011 and Cars-196 datasets demonstrate our competitive performance relative to the state of the art.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"15 1","pages":"6497-6506"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87117712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 105
LCR-Net: Localization-Classification-Regression for Human Pose LCR-Net:人体姿态的定位-分类-回归
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.134
Grégory Rogez, Philippe Weinzaepfel, C. Schmid
We propose an end-to-end architecture for joint 2D and 3D human pose estimation in natural images. Key to our approach is the generation and scoring of a number of pose proposals per image, which allows us to predict 2D and 3D pose of multiple people simultaneously. Hence, our approach does not require an approximate localization of the humans for initialization. Our architecture, named LCR-Net, contains 3 main components: 1) the pose proposal generator that suggests potential poses at different locations in the image, 2) a classifier that scores the different pose proposals, and 3) a regressor that refines pose proposals both in 2D and 3D. All three stages share the convolutional feature layers and are trained jointly. The final pose estimation is obtained by integrating over neighboring pose hypotheses, which is shown to improve over a standard non maximum suppression algorithm. Our approach significantly outperforms the state of the art in 3D pose estimation on Human3.6M, a controlled environment. Moreover, it shows promising results on real images for both single and multi-person subsets of the MPII 2D pose benchmark.
我们提出了一种用于自然图像中关节2D和3D人体姿态估计的端到端架构。我们方法的关键是每张图像的姿势建议的生成和评分,这使我们能够同时预测多人的2D和3D姿势。因此,我们的方法不需要初始化人类的大致定位。我们的架构名为LCR-Net,包含3个主要组件:1)在图像中不同位置建议潜在姿势的姿势建议生成器,2)对不同姿势建议进行评分的分类器,以及3)在2D和3D中提炼姿势建议的回归器。这三个阶段共享卷积特征层,并联合训练。最终姿态估计是通过对相邻姿态假设进行积分得到的,与标准的非极大值抑制算法相比,该算法得到了改进。我们的方法在受控环境Human3.6M上的3D姿态估计方面明显优于目前的技术水平。此外,对于MPII 2D姿态基准的单人和多人子集,它在真实图像上显示了令人鼓舞的结果。
{"title":"LCR-Net: Localization-Classification-Regression for Human Pose","authors":"Grégory Rogez, Philippe Weinzaepfel, C. Schmid","doi":"10.1109/CVPR.2017.134","DOIUrl":"https://doi.org/10.1109/CVPR.2017.134","url":null,"abstract":"We propose an end-to-end architecture for joint 2D and 3D human pose estimation in natural images. Key to our approach is the generation and scoring of a number of pose proposals per image, which allows us to predict 2D and 3D pose of multiple people simultaneously. Hence, our approach does not require an approximate localization of the humans for initialization. Our architecture, named LCR-Net, contains 3 main components: 1) the pose proposal generator that suggests potential poses at different locations in the image, 2) a classifier that scores the different pose proposals, and 3) a regressor that refines pose proposals both in 2D and 3D. All three stages share the convolutional feature layers and are trained jointly. The final pose estimation is obtained by integrating over neighboring pose hypotheses, which is shown to improve over a standard non maximum suppression algorithm. Our approach significantly outperforms the state of the art in 3D pose estimation on Human3.6M, a controlled environment. Moreover, it shows promising results on real images for both single and multi-person subsets of the MPII 2D pose benchmark.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"181 1","pages":"1216-1224"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85560699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 280
Real-Time 3D Model Tracking in Color and Depth on a Single CPU Core 实时3D模型跟踪的颜色和深度在一个单一的CPU核心
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.57
Wadim Kehl, Federico Tombari, Slobodan Ilic, Nassir Navab
We present a novel method to track 3D models in color and depth data. To this end, we introduce approximations that accelerate the state-of-the-art in region-based tracking by an order of magnitude while retaining similar accuracy. Furthermore, we show how the method can be made more robust in the presence of depth data and consequently formulate a new joint contour and ICP tracking energy. We present better results than the state-of-the-art while being much faster then most other methods and achieving all of the above on a single CPU core.
提出了一种基于颜色和深度数据的三维模型跟踪方法。为此,我们引入了近似值,以一个数量级加速了基于区域的跟踪中的最新技术,同时保持了类似的精度。此外,我们展示了如何在深度数据存在的情况下使该方法更具鲁棒性,从而制定新的联合轮廓和ICP跟踪能量。我们提供了比最先进的更好的结果,同时比大多数其他方法快得多,并且在单个CPU核心上实现了上述所有功能。
{"title":"Real-Time 3D Model Tracking in Color and Depth on a Single CPU Core","authors":"Wadim Kehl, Federico Tombari, Slobodan Ilic, Nassir Navab","doi":"10.1109/CVPR.2017.57","DOIUrl":"https://doi.org/10.1109/CVPR.2017.57","url":null,"abstract":"We present a novel method to track 3D models in color and depth data. To this end, we introduce approximations that accelerate the state-of-the-art in region-based tracking by an order of magnitude while retaining similar accuracy. Furthermore, we show how the method can be made more robust in the presence of depth data and consequently formulate a new joint contour and ICP tracking energy. We present better results than the state-of-the-art while being much faster then most other methods and achieving all of the above on a single CPU core.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"73 1","pages":"465-473"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86373315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Weakly Supervised Affordance Detection 弱监督可视性检测
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.552
Johann Sawatzky, A. Srikantha, Juergen Gall
Localizing functional regions of objects or affordances is an important aspect of scene understanding and relevant for many robotics applications. In this work, we introduce a pixel-wise annotated affordance dataset of 3090 images containing 9916 object instances. Since parts of an object can have multiple affordances, we address this by a convolutional neural network for multilabel affordance segmentation. We also propose an approach to train the network from very few keypoint annotations. Our approach achieves a higher affordance detection accuracy than other weakly supervised methods that also rely on keypoint annotations or image annotations as weak supervision.
定位物体或启示的功能区域是场景理解的一个重要方面,与许多机器人应用相关。在这项工作中,我们引入了一个包含9916个对象实例的3090张图像的逐像素注释的可视性数据集。由于一个对象的部分可以有多个启示,我们通过卷积神经网络来解决这个问题,用于多标签启示分割。我们还提出了一种从很少的关键点注释中训练网络的方法。与其他同样依赖关键点标注或图像标注作为弱监督的弱监督方法相比,我们的方法实现了更高的可视性检测精度。
{"title":"Weakly Supervised Affordance Detection","authors":"Johann Sawatzky, A. Srikantha, Juergen Gall","doi":"10.1109/CVPR.2017.552","DOIUrl":"https://doi.org/10.1109/CVPR.2017.552","url":null,"abstract":"Localizing functional regions of objects or affordances is an important aspect of scene understanding and relevant for many robotics applications. In this work, we introduce a pixel-wise annotated affordance dataset of 3090 images containing 9916 object instances. Since parts of an object can have multiple affordances, we address this by a convolutional neural network for multilabel affordance segmentation. We also propose an approach to train the network from very few keypoint annotations. Our approach achieves a higher affordance detection accuracy than other weakly supervised methods that also rely on keypoint annotations or image annotations as weak supervision.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"13 1","pages":"5197-5206"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90831863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Learning to Detect Salient Objects with Image-Level Supervision 学习用图像级监督检测显著物体
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.404
Lijun Wang, Huchuan Lu, Yifan Wang, Mengyang Feng, D. Wang, Baocai Yin, Xiang Ruan
Deep Neural Networks (DNNs) have substantially improved the state-of-the-art in salient object detection. However, training DNNs requires costly pixel-level annotations. In this paper, we leverage the observation that image-level tags provide important cues of foreground salient objects, and develop a weakly supervised learning method for saliency detection using image-level tags only. The Foreground Inference Network (FIN) is introduced for this challenging task. In the first stage of our training method, FIN is jointly trained with a fully convolutional network (FCN) for image-level tag prediction. A global smooth pooling layer is proposed, enabling FCN to assign object category tags to corresponding object regions, while FIN is capable of capturing all potential foreground regions with the predicted saliency maps. In the second stage, FIN is fine-tuned with its predicted saliency maps as ground truth. For refinement of ground truth, an iterative Conditional Random Field is developed to enforce spatial label consistency and further boost performance. Our method alleviates annotation efforts and allows the usage of existing large scale training sets with image-level tags. Our model runs at 60 FPS, outperforms unsupervised ones with a large margin, and achieves comparable or even superior performance than fully supervised counterparts.
深度神经网络(dnn)在显著目标检测方面取得了长足的进步。然而,训练dnn需要昂贵的像素级注释。在本文中,我们利用图像级标签提供前景显著性对象的重要线索的观察结果,开发了一种仅使用图像级标签进行显著性检测的弱监督学习方法。前景推理网络(FIN)被引入到这个具有挑战性的任务中。在我们的训练方法的第一阶段,FIN与全卷积网络(FCN)联合训练,用于图像级标签预测。提出了一种全局平滑池化层,使FCN能够将目标类别标签分配到相应的目标区域,而FIN能够利用预测的显著性图捕获所有潜在的前景区域。在第二阶段,FIN将其预测的显著性图作为基础真理进行微调。为了改进地面真值,开发了一个迭代条件随机场来强制空间标签一致性并进一步提高性能。我们的方法减轻了标注工作,并允许使用现有的具有图像级标签的大规模训练集。我们的模型以60 FPS的速度运行,在很大程度上优于无监督的模型,并且实现了与完全监督的模型相当甚至更好的性能。
{"title":"Learning to Detect Salient Objects with Image-Level Supervision","authors":"Lijun Wang, Huchuan Lu, Yifan Wang, Mengyang Feng, D. Wang, Baocai Yin, Xiang Ruan","doi":"10.1109/CVPR.2017.404","DOIUrl":"https://doi.org/10.1109/CVPR.2017.404","url":null,"abstract":"Deep Neural Networks (DNNs) have substantially improved the state-of-the-art in salient object detection. However, training DNNs requires costly pixel-level annotations. In this paper, we leverage the observation that image-level tags provide important cues of foreground salient objects, and develop a weakly supervised learning method for saliency detection using image-level tags only. The Foreground Inference Network (FIN) is introduced for this challenging task. In the first stage of our training method, FIN is jointly trained with a fully convolutional network (FCN) for image-level tag prediction. A global smooth pooling layer is proposed, enabling FCN to assign object category tags to corresponding object regions, while FIN is capable of capturing all potential foreground regions with the predicted saliency maps. In the second stage, FIN is fine-tuned with its predicted saliency maps as ground truth. For refinement of ground truth, an iterative Conditional Random Field is developed to enforce spatial label consistency and further boost performance. Our method alleviates annotation efforts and allows the usage of existing large scale training sets with image-level tags. Our model runs at 60 FPS, outperforms unsupervised ones with a large margin, and achieves comparable or even superior performance than fully supervised counterparts.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"69 1","pages":"3796-3805"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90913385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 821
Snapshot Hyperspectral Light Field Imaging 快照高光谱光场成像
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.727
Zhiwei Xiong, Lizhi Wang, Huiqun Li, Dong Liu, Feng Wu
This paper presents the first snapshot hyperspectral light field imager in practice. Specifically, we design a novel hybrid camera system to obtain two complementary measurements that sample the angular and spectral dimensions respectively. To recover the full 5D hyperspectral light field from the severely undersampled measurements, we then propose an efficient computational reconstruction algorithm by exploiting the large correlations across the angular and spectral dimensions through self-learned dictionaries. Simulation on an elaborate hyperspectral light field dataset validates the effectiveness of the proposed approach. Hardware experimental results demonstrate that, for the first time to our knowledge, a 5D hyperspectral light field containing 9x9 angular views and 27 spectral bands can be acquired in a single shot.
本文介绍了应用中的第一台快照式高光谱光场成像仪。具体来说,我们设计了一种新的混合相机系统,以获得两个互补的测量,分别采样角和光谱尺寸。为了从严重欠采样测量中恢复完整的5D高光谱光场,我们提出了一种高效的计算重建算法,该算法通过自学习字典利用角维和光谱维之间的大相关性。在一个复杂的高光谱光场数据集上进行了仿真,验证了该方法的有效性。硬件实验结果表明,在我们所知的范围内,首次可以在一次拍摄中获得包含9x9角度视图和27个光谱带的5D高光谱光场。
{"title":"Snapshot Hyperspectral Light Field Imaging","authors":"Zhiwei Xiong, Lizhi Wang, Huiqun Li, Dong Liu, Feng Wu","doi":"10.1109/CVPR.2017.727","DOIUrl":"https://doi.org/10.1109/CVPR.2017.727","url":null,"abstract":"This paper presents the first snapshot hyperspectral light field imager in practice. Specifically, we design a novel hybrid camera system to obtain two complementary measurements that sample the angular and spectral dimensions respectively. To recover the full 5D hyperspectral light field from the severely undersampled measurements, we then propose an efficient computational reconstruction algorithm by exploiting the large correlations across the angular and spectral dimensions through self-learned dictionaries. Simulation on an elaborate hyperspectral light field dataset validates the effectiveness of the proposed approach. Hardware experimental results demonstrate that, for the first time to our knowledge, a 5D hyperspectral light field containing 9x9 angular views and 27 spectral bands can be acquired in a single shot.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"6873-6881"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74740411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
The Misty Three Point Algorithm for Relative Pose 相对姿态的模糊三点算法
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.484
Tobias Palmér, Kalle Åström, Jan-Michael Frahm
There is a significant interest in scene reconstruction from underwater images given its utility for oceanic research and for recreational image manipulation. In this paper we propose a novel algorithm for two view camera motion estimation for underwater imagery. Our method leverages the constraints provided by the attenuation properties of water and its effects on the appearance of the color to determine the depth difference of a point with respect to the two observing views of the underwater cameras. Additionally, we propose an algorithm, leveraging the depth differences of three such observed points, to estimate the relative pose of the cameras. Given the unknown underwater attenuation coefficients, our method estimates the relative motion up to scale. The results are represented as a generalized camera. We evaluate our method on both real data and simulated data.
鉴于水下图像对海洋研究和娱乐图像处理的效用,对水下图像的场景重建有很大的兴趣。本文提出了一种新的水下图像双视角摄像机运动估计算法。我们的方法利用水的衰减特性及其对颜色外观的影响所提供的限制来确定一个点相对于水下相机的两个观察视图的深度差。此外,我们提出了一种算法,利用三个这样的观测点的深度差异,估计相机的相对姿态。考虑到未知的水下衰减系数,我们的方法可以按比例估计相对运动。结果表示为广义相机。我们用实际数据和模拟数据对我们的方法进行了评估。
{"title":"The Misty Three Point Algorithm for Relative Pose","authors":"Tobias Palmér, Kalle Åström, Jan-Michael Frahm","doi":"10.1109/CVPR.2017.484","DOIUrl":"https://doi.org/10.1109/CVPR.2017.484","url":null,"abstract":"There is a significant interest in scene reconstruction from underwater images given its utility for oceanic research and for recreational image manipulation. In this paper we propose a novel algorithm for two view camera motion estimation for underwater imagery. Our method leverages the constraints provided by the attenuation properties of water and its effects on the appearance of the color to determine the depth difference of a point with respect to the two observing views of the underwater cameras. Additionally, we propose an algorithm, leveraging the depth differences of three such observed points, to estimate the relative pose of the cameras. Given the unknown underwater attenuation coefficients, our method estimates the relative motion up to scale. The results are represented as a generalized camera. We evaluate our method on both real data and simulated data.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"26 1","pages":"4551-4559"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83166753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space L2-Net:欧几里得空间中判别Patch描述符的深度学习
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.649
Yurun Tian, Bin Fan, Fuchao Wu
The research focus of designing local patch descriptors has gradually shifted from handcrafted ones (e.g., SIFT) to learned ones. In this paper, we propose to learn high performance descriptor in Euclidean space via the Convolutional Neural Network (CNN). Our method is distinctive in four aspects: (i) We propose a progressive sampling strategy which enables the network to access billions of training samples in a few epochs. (ii) Derived from the basic concept of local patch matching problem, we empha-size the relative distance between descriptors. (iii) Extra supervision is imposed on the intermediate feature maps. (iv) Compactness of the descriptor is taken into account. The proposed network is named as L2-Net since the output descriptor can be matched in Euclidean space by L2 distance. L2-Net achieves state-of-the-art performance on the Brown datasets [16], Oxford dataset [18] and the newly proposed Hpatches dataset [11]. The good generalization ability shown by experiments indicates that L2-Net can serve as a direct substitution of the existing handcrafted descriptors. The pre-trained L2-Net is publicly available.
局部补丁描述符设计的研究重点逐渐从手工设计(如SIFT)转向学习设计。本文提出利用卷积神经网络(CNN)在欧几里得空间中学习高性能描述符。我们的方法在四个方面与众不同:(i)我们提出了一种渐进式采样策略,使网络能够在几个时代内访问数十亿个训练样本。(ii)从局部补丁匹配问题的基本概念出发,强调描述符之间相对距离的大小。(iii)对中间特征图施加额外的监督。(iv)考虑到描述符的紧凑性。由于输出描述符可以通过L2距离在欧几里得空间中匹配,因此将所提出的网络命名为L2- net。L2-Net在布朗数据集[16]、牛津数据集[18]和新提出的Hpatches数据集[11]上实现了最先进的性能。实验表明L2-Net具有良好的泛化能力,可以直接替代现有的手工描述符。预先训练的L2-Net是公开可用的。
{"title":"L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space","authors":"Yurun Tian, Bin Fan, Fuchao Wu","doi":"10.1109/CVPR.2017.649","DOIUrl":"https://doi.org/10.1109/CVPR.2017.649","url":null,"abstract":"The research focus of designing local patch descriptors has gradually shifted from handcrafted ones (e.g., SIFT) to learned ones. In this paper, we propose to learn high performance descriptor in Euclidean space via the Convolutional Neural Network (CNN). Our method is distinctive in four aspects: (i) We propose a progressive sampling strategy which enables the network to access billions of training samples in a few epochs. (ii) Derived from the basic concept of local patch matching problem, we empha-size the relative distance between descriptors. (iii) Extra supervision is imposed on the intermediate feature maps. (iv) Compactness of the descriptor is taken into account. The proposed network is named as L2-Net since the output descriptor can be matched in Euclidean space by L2 distance. L2-Net achieves state-of-the-art performance on the Brown datasets [16], Oxford dataset [18] and the newly proposed Hpatches dataset [11]. The good generalization ability shown by experiments indicates that L2-Net can serve as a direct substitution of the existing handcrafted descriptors. The pre-trained L2-Net is publicly available.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"6128-6136"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83200304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 425
Learned Contextual Feature Reweighting for Image Geo-Localization 图像地理定位的学习上下文特征重加权
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.346
Hyo Jin Kim, Enrique Dunn, Jan-Michael Frahm
We address the problem of large scale image geo-localization where the location of an image is estimated by identifying geo-tagged reference images depicting the same place. We propose a novel model for learning image representations that integrates context-aware feature reweighting in order to effectively focus on regions that positively contribute to geo-localization. In particular, we introduce a Contextual Reweighting Network (CRN) that predicts the importance of each region in the feature map based on the image context. Our model is learned end-to-end for the image geo-localization task, and requires no annotation other than image geo-tags for training. In experimental results, the proposed approach significantly outperforms the previous state-of-the-art on the standard geo-localization benchmark datasets. We also demonstrate that our CRN discovers task-relevant contexts without any additional supervision.
我们解决了大规模图像地理定位的问题,其中通过识别描绘同一地点的地理标记参考图像来估计图像的位置。我们提出了一种新的图像表征学习模型,该模型集成了上下文感知特征重加权,以便有效地关注对地理定位有积极贡献的区域。特别是,我们引入了上下文重加权网络(CRN),该网络基于图像上下文预测特征映射中每个区域的重要性。对于图像地理定位任务,我们的模型是端到端学习的,除了图像地理标记外,不需要任何注释进行训练。实验结果表明,该方法在标准地理定位基准数据集上的性能明显优于现有方法。我们还证明,我们的CRN可以在没有任何额外监督的情况下发现与任务相关的上下文。
{"title":"Learned Contextual Feature Reweighting for Image Geo-Localization","authors":"Hyo Jin Kim, Enrique Dunn, Jan-Michael Frahm","doi":"10.1109/CVPR.2017.346","DOIUrl":"https://doi.org/10.1109/CVPR.2017.346","url":null,"abstract":"We address the problem of large scale image geo-localization where the location of an image is estimated by identifying geo-tagged reference images depicting the same place. We propose a novel model for learning image representations that integrates context-aware feature reweighting in order to effectively focus on regions that positively contribute to geo-localization. In particular, we introduce a Contextual Reweighting Network (CRN) that predicts the importance of each region in the feature map based on the image context. Our model is learned end-to-end for the image geo-localization task, and requires no annotation other than image geo-tags for training. In experimental results, the proposed approach significantly outperforms the previous state-of-the-art on the standard geo-localization benchmark datasets. We also demonstrate that our CRN discovers task-relevant contexts without any additional supervision.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"64 2 1","pages":"3251-3260"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83269225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 167
期刊
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1