2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文中文

How Hard Can It Be? Estimating the Difficulty of Visual Search in an Image 能有多难呢?估计图像中视觉搜索的难度

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-12-12 DOI: 10.1109/CVPR.2016.237

Radu Tudor Ionescu, B. Alexe, Marius Leordeanu, M. Popescu, Dim P. Papadopoulos, V. Ferrari

We address the problem of estimating image difficulty defined as the human response time for solving a visual search task. We collect human annotations of image difficulty for the PASCAL VOC 2012 data set through a crowd-sourcing platform. We then analyze what human interpretable image properties can have an impact on visual search difficulty, and how accurate are those properties for predicting difficulty. Next, we build a regression model based on deep features learned with state of the art convolutional neural networks and show better results for predicting the ground-truth visual search difficulty scores produced by human annotators. Our model is able to correctly rank about 75% image pairs according to their difficulty score. We also show that our difficulty predictor generalizes well to new classes not seen during training. Finally, we demonstrate that our predicted difficulty scores are useful for weakly supervised object localization (8% improvement) and semi-supervised object classification (1% improvement).

我们解决了估计图像难度的问题，定义为解决视觉搜索任务的人类响应时间。我们通过众包平台收集PASCAL VOC 2012数据集的图像难度人工标注。然后，我们分析了人类可解释的图像属性对视觉搜索难度的影响，以及这些属性预测难度的准确性。接下来，我们基于使用最先进的卷积神经网络学习的深度特征构建了一个回归模型，并在预测人类注释者产生的真实视觉搜索难度分数方面显示出更好的结果。我们的模型能够根据图像的难度分数对大约75%的图像对进行正确排序。我们还表明，我们的难度预测器可以很好地推广到训练中没有看到的新课程。最后，我们证明了我们预测的难度分数对于弱监督对象定位(提高8%)和半监督对象分类(提高1%)是有用的。

引用次数: 112

Simultaneous Optical Flow and Intensity Estimation from an Event Camera 事件相机同时光流和光强估计

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-12-12 DOI: 10.1109/CVPR.2016.102

Patrick Bardow, A. Davison, Stefan Leutenegger

Event cameras are bio-inspired vision sensors which mimic retinas to measure per-pixel intensity change rather than outputting an actual intensity image. This proposed paradigm shift away from traditional frame cameras offers significant potential advantages: namely avoiding high data rates, dynamic range limitations and motion blur. Unfortunately, however, established computer vision algorithms may not at all be applied directly to event cameras. Methods proposed so far to reconstruct images, estimate optical flow, track a camera and reconstruct a scene come with severe restrictions on the environment or on the motion of the camera, e.g. allowing only rotation. Here, we propose, to the best of our knowledge, the first algorithm to simultaneously recover the motion field and brightness image, while the camera undergoes a generic motion through any scene. Our approach employs minimisation of a cost function that contains the asynchronous event data as well as spatial and temporal regularisation within a sliding window time interval. Our implementation relies on GPU optimisation and runs in near real-time. In a series of examples, we demonstrate the successful operation of our framework, including in situations where conventional cameras suffer from dynamic range limitations and motion blur.

事件相机是仿生视觉传感器，它模仿视网膜来测量每像素的强度变化，而不是输出实际的强度图像。这种从传统画幅相机转变而来的模式提供了显著的潜在优势:即避免高数据速率、动态范围限制和运动模糊。然而，不幸的是，现有的计算机视觉算法可能根本不能直接应用于事件摄像机。目前提出的重建图像、估计光流、跟踪相机和重建场景的方法都对环境或相机的运动有严格的限制，例如只允许旋转。在这里，我们提出，据我们所知，第一个算法，同时恢复运动场和亮度图像，而相机经历了一个通用的运动通过任何场景。我们的方法采用最小化成本函数，该函数包含异步事件数据以及滑动窗口时间间隔内的空间和时间正则化。我们的实现依赖于GPU优化并在接近实时的情况下运行。在一系列的例子中，我们展示了我们的框架的成功运作，包括在传统相机受到动态范围限制和运动模糊的情况下。

{"title":"Simultaneous Optical Flow and Intensity Estimation from an Event Camera","authors":"Patrick Bardow, A. Davison, Stefan Leutenegger","doi":"10.1109/CVPR.2016.102","DOIUrl":"https://doi.org/10.1109/CVPR.2016.102","url":null,"abstract":"Event cameras are bio-inspired vision sensors which mimic retinas to measure per-pixel intensity change rather than outputting an actual intensity image. This proposed paradigm shift away from traditional frame cameras offers significant potential advantages: namely avoiding high data rates, dynamic range limitations and motion blur. Unfortunately, however, established computer vision algorithms may not at all be applied directly to event cameras. Methods proposed so far to reconstruct images, estimate optical flow, track a camera and reconstruct a scene come with severe restrictions on the environment or on the motion of the camera, e.g. allowing only rotation. Here, we propose, to the best of our knowledge, the first algorithm to simultaneously recover the motion field and brightness image, while the camera undergoes a generic motion through any scene. Our approach employs minimisation of a cost function that contains the asynchronous event data as well as spatial and temporal regularisation within a sliding window time interval. Our implementation relies on GPU optimisation and runs in near real-time. In a series of examples, we demonstrate the successful operation of our framework, including in situations where conventional cameras suffer from dynamic range limitations and motion blur.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"43 1","pages":"884-892"},"PeriodicalIF":0.0,"publicationDate":"2016-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88286538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 220

Multivariate Regression on the Grassmannian for Predicting Novel Domains 格拉斯曼预测新领域的多元回归

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-12-12 DOI: 10.1109/CVPR.2016.548

Yongxin Yang, Timothy M. Hospedales

We study the problem of predicting how to recognise visual objects in novel domains with neither labelled nor unlabelled training data. Domain adaptation is now an established research area due to its value in ameliorating the issue of domain shift between train and test data. However, it is conventionally assumed that domains are discrete entities, and that at least unlabelled data is provided in testing domains. In this paper, we consider the case where domains are parametrised by a vector of continuous values (e.g., time, lighting or view angle). We aim to use such domain metadata to predict novel domains for recognition. This allows a recognition model to be pre-calibrated for a new domain in advance (e.g., future time or view angle) without waiting for data collection and re-training. We achieve this by posing the problem as one of multivariate regression on the Grassmannian, where we regress a domain's subspace (point on the Grassmannian) against an independent vector of domain parameters. We derive two novel methodologies to achieve this challenging task: a direct kernel regression from RM ! G, and an indirect method with better extrapolation properties. We evaluate our methods on two crossdomain visual recognition benchmarks, where they perform close to the upper bound of full data domain adaptation. This demonstrates that data is not necessary for domain adaptation if a domain can be parametrically described.

我们研究了如何在没有标记和未标记的训练数据的情况下预测新领域中的视觉对象。领域自适应在改善训练数据和测试数据之间的领域转移问题方面具有重要的价值，是目前一个成熟的研究领域。然而，通常假设域是离散的实体，并且至少在测试域中提供未标记的数据。在本文中，我们考虑由连续值(例如，时间，光照或视角)的向量参数化域的情况。我们的目标是使用这些领域元数据来预测新的领域以进行识别。这允许识别模型提前为新领域(例如，未来的时间或视角)预先校准，而无需等待数据收集和重新训练。我们通过将问题作为Grassmannian上的多元回归之一来实现这一点，其中我们根据域参数的独立向量回归域的子空间(Grassmannian上的点)。我们推导了两种新颖的方法来完成这项具有挑战性的任务:直接从RM进行核回归!G，以及一种具有更好外推性质的间接方法。我们在两个跨域视觉识别基准上评估了我们的方法，它们的表现接近全数据域自适应的上界。这表明，如果一个域可以被参数化描述，那么数据对于域适应是不需要的。

{"title":"Multivariate Regression on the Grassmannian for Predicting Novel Domains","authors":"Yongxin Yang, Timothy M. Hospedales","doi":"10.1109/CVPR.2016.548","DOIUrl":"https://doi.org/10.1109/CVPR.2016.548","url":null,"abstract":"We study the problem of predicting how to recognise visual objects in novel domains with neither labelled nor unlabelled training data. Domain adaptation is now an established research area due to its value in ameliorating the issue of domain shift between train and test data. However, it is conventionally assumed that domains are discrete entities, and that at least unlabelled data is provided in testing domains. In this paper, we consider the case where domains are parametrised by a vector of continuous values (e.g., time, lighting or view angle). We aim to use such domain metadata to predict novel domains for recognition. This allows a recognition model to be pre-calibrated for a new domain in advance (e.g., future time or view angle) without waiting for data collection and re-training. We achieve this by posing the problem as one of multivariate regression on the Grassmannian, where we regress a domain's subspace (point on the Grassmannian) against an independent vector of domain parameters. We derive two novel methodologies to achieve this challenging task: a direct kernel regression from RM ! G, and an indirect method with better extrapolation properties. We evaluate our methods on two crossdomain visual recognition benchmarks, where they perform close to the upper bound of full data domain adaptation. This demonstrates that data is not necessary for domain adaptation if a domain can be parametrically described.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"6 1","pages":"5071-5080"},"PeriodicalIF":0.0,"publicationDate":"2016-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84776732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Sketch Me That Shoe 给我画那只鞋

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-12-12 DOI: 10.1109/CVPR.2016.93

Qian Yu, Feng Liu, Yi-Zhe Song, T. Xiang, Timothy M. Hospedales, Chen Change Loy

We investigate the problem of fine-grained sketch-based image retrieval (SBIR), where free-hand human sketches are used as queries to perform instance-level retrieval of images. This is an extremely challenging task because (i) visual comparisons not only need to be fine-grained but also executed cross-domain, (ii) free-hand (finger) sketches are highly abstract, making fine-grained matching harder, and most importantly (iii) annotated cross-domain sketch-photo datasets required for training are scarce, challenging many state-of-the-art machine learning techniques. In this paper, for the first time, we address all these challenges, providing a step towards the capabilities that would underpin a commercial sketch-based image retrieval application. We introduce a new database of 1,432 sketchphoto pairs from two categories with 32,000 fine-grained triplet ranking annotations. We then develop a deep tripletranking model for instance-level SBIR with a novel data augmentation and staged pre-training strategy to alleviate the issue of insufficient fine-grained training data. Extensive experiments are carried out to contribute a variety of insights into the challenges of data sufficiency and over-fitting avoidance when training deep networks for finegrained cross-domain ranking tasks.

我们研究了基于细粒度草图的图像检索(SBIR)问题，其中使用手绘草图作为查询来执行图像的实例级检索。这是一项极具挑战性的任务，因为(i)视觉比较不仅需要细粒度，而且还需要跨域执行，(ii)徒手(手指)草图高度抽象，使得细粒度匹配更加困难，最重要的是(iii)训练所需的带注释的跨域草图照片数据集稀缺，挑战了许多最先进的机器学习技术。在本文中，我们第一次解决了所有这些挑战，提供了一个支持商业基于草图的图像检索应用程序的步骤。我们引入了一个新的数据库，包含来自两个类别的1,432对素描照片，以及32,000个细粒度的三组排序注释。然后，我们开发了一个实例级SBIR的深度三重排序模型，该模型具有新颖的数据增强和分阶段预训练策略，以缓解细粒度训练数据不足的问题。在为细粒度跨域排序任务训练深度网络时，进行了广泛的实验，以对数据充分性和过度拟合避免的挑战提供各种见解。

{"title":"Sketch Me That Shoe","authors":"Qian Yu, Feng Liu, Yi-Zhe Song, T. Xiang, Timothy M. Hospedales, Chen Change Loy","doi":"10.1109/CVPR.2016.93","DOIUrl":"https://doi.org/10.1109/CVPR.2016.93","url":null,"abstract":"We investigate the problem of fine-grained sketch-based image retrieval (SBIR), where free-hand human sketches are used as queries to perform instance-level retrieval of images. This is an extremely challenging task because (i) visual comparisons not only need to be fine-grained but also executed cross-domain, (ii) free-hand (finger) sketches are highly abstract, making fine-grained matching harder, and most importantly (iii) annotated cross-domain sketch-photo datasets required for training are scarce, challenging many state-of-the-art machine learning techniques. In this paper, for the first time, we address all these challenges, providing a step towards the capabilities that would underpin a commercial sketch-based image retrieval application. We introduce a new database of 1,432 sketchphoto pairs from two categories with 32,000 fine-grained triplet ranking annotations. We then develop a deep tripletranking model for instance-level SBIR with a novel data augmentation and staged pre-training strategy to alleviate the issue of insufficient fine-grained training data. Extensive experiments are carried out to contribute a variety of insights into the challenges of data sufficiency and over-fitting avoidance when training deep networks for finegrained cross-domain ranking tasks.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"15 1","pages":"799-807"},"PeriodicalIF":0.0,"publicationDate":"2016-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74513871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 379

Discovering the Physical Parts of an Articulated Object Class from Multiple Videos 从多个视频中发现铰接对象类的物理部分

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-12-12 DOI: 10.1109/CVPR.2016.84

Luca Del Pero, Susanna Ricco, R. Sukthankar, V. Ferrari

We propose a motion-based method to discover the physical parts of an articulated object class (e.g. head/torso/leg of a horse) from multiple videos. The key is to find object regions that exhibit consistent motion relative to the rest of the object, across multiple videos. We can then learn a location model for the parts and segment them accurately in the individual videos using an energy function that also enforces temporal and spatial consistency in part motion. Unlike our approach, traditional methods for motion segmentation or non-rigid structure from motion operate on one video at a time. Hence they cannot discover a part unless it displays independent motion in that particular video. We evaluate our method on a new dataset of 32 videos of tigers and horses, where we significantly outperform a recent motion segmentation method on the task of part discovery (obtaining roughly twice the accuracy).

我们提出了一种基于运动的方法来从多个视频中发现铰接对象类的物理部分(例如，马的头/躯干/腿)。关键是在多个视频中找到相对于物体其余部分表现出一致运动的物体区域。然后，我们可以学习零件的位置模型，并使用能量函数在单个视频中准确地分割它们，该函数还可以强制零件运动的时间和空间一致性。与我们的方法不同，传统的运动分割方法或来自运动的非刚性结构一次操作一个视频。因此，他们无法发现一个零件，除非它在特定的视频中显示独立的运动。我们在一个包含32个老虎和马视频的新数据集上评估了我们的方法，在零件发现任务上，我们明显优于最近的运动分割方法(获得大约两倍的精度)。

引用次数: 12

Kinematic Structure Correspondences via Hypergraph Matching 基于超图匹配的运动结构对应

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-12-12 DOI: 10.1109/CVPR.2016.457

H. Chang, Tobias Fischer, Maxime Petit, Martina Zambelli, Y. Demiris

In this paper, we present a novel framework for finding the kinematic structure correspondence between two objects in videos via hypergraph matching. In contrast to prior appearance and graph alignment based matching methods which have been applied among two similar static images, the proposed method finds correspondences between two dynamic kinematic structures of heterogeneous objects in videos. Our main contributions can be summarised as follows: (i) casting the kinematic structure correspondence problem into a hypergraph matching problem, incorporating multi-order similarities with normalising weights, (ii) a structural topology similarity measure by a new topology constrained subgraph isomorphism aggregation, (iii) a kinematic correlation measure between pairwise nodes, and (iv) a combinatorial local motion similarity measure using geodesic distance on the Riemannian manifold. We demonstrate the robustness and accuracy of our method through a number of experiments on complex articulated synthetic and real data.

本文提出了一种利用超图匹配来寻找视频中两个对象之间的运动结构对应关系的新框架。与之前在两个相似的静态图像之间应用的基于外观和图对齐的匹配方法相比，该方法发现了视频中异构对象的两个动态运动学结构之间的对应关系。我们的主要贡献可以总结如下:(i)将运动结构对应问题转化为超图匹配问题，将多阶相似度与归一化权重相结合，(ii)通过新的拓扑约束子图同构聚集的结构拓扑相似性度量，(iii)两两节点之间的运动相关性度量，以及(iv)使用黎曼流形上的测地距离的组合局部运动相似性度量。通过对复杂关节合成数据和实际数据的大量实验，证明了该方法的鲁棒性和准确性。

引用次数: 17

iLab-20M: A Large-Scale Controlled Object Dataset to Investigate Deep Learning iLab-20M:用于研究深度学习的大规模受控对象数据集

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-07-01 DOI: 10.1109/CVPR.2016.244

A. Borji, S. Izadi, L. Itti

Tolerance to image variations (e.g., translation, scale, pose, illumination, background) is an important desired property of any object recognition system, be it human or machine. Moving towards increasingly bigger datasets has been trending in computer vision especially with the emergence of highly popular deep learning models. While being very useful for learning invariance to object inter-and intra-class shape variability, these large-scale wild datasets are not very useful for learning invariance to other parameters urging researchers to resort to other tricks for training models. In this work, we introduce a large-scale synthetic dataset, which is freely and publicly available, and use it to answer several fundamental questions regarding selectivity and invariance properties of convolutional neural networks. Our dataset contains two parts: a) objects shot on a turntable: 15 categories, 8 rotation angles, 11 cameras on a semi-circular arch, 5 lighting conditions, 3 focus levels, variety of backgrounds (23.4 per instance) generating 1320 images per instance (about 22 million images in total), and b) scenes: in which a robotic arm takes pictures of objects on a 1:160 scale scene. We study: 1) invariance and selectivity of different CNN layers, 2) knowledge transfer from one object category to another, 3) systematic or random sampling of images to build a train set, 4) domain adaptation from synthetic to natural scenes, and 5) order of knowledge delivery to CNNs. We also discuss how our analyses can lead the field to develop more efficient deep learning methods.

对图像变化的容忍度(例如，平移，比例，姿势，照明，背景)是任何物体识别系统的重要期望属性，无论是人类还是机器。越来越大的数据集是计算机视觉的趋势，特别是随着高度流行的深度学习模型的出现。虽然这些大规模的野生数据集对于学习对象类间和类内形状可变性的不变性非常有用，但对于学习其他参数的不变性并不是很有用，这促使研究人员求助于其他技巧来训练模型。在这项工作中，我们引入了一个大规模的合成数据集，它是免费和公开的，并使用它来回答关于卷积神经网络的选择性和不变性的几个基本问题。我们的数据集包含两部分:a)在转盘上拍摄的物体:15个类别，8个旋转角度，半圆形拱门上的11个摄像头，5个照明条件，3个对焦级别，各种背景(每个实例23.4个)，每个实例生成1320张图像(总共约2200万张图像)，b)场景:机器人手臂在1:160比例的场景中拍摄物体。我们研究了:1)不同CNN层的不变性和选择性，2)从一个对象类别到另一个对象类别的知识迁移，3)系统或随机采样图像以构建训练集，4)从合成场景到自然场景的领域自适应，5)知识传递到CNN的顺序。我们还讨论了我们的分析如何能够引领该领域开发更有效的深度学习方法。

{"title":"iLab-20M: A Large-Scale Controlled Object Dataset to Investigate Deep Learning","authors":"A. Borji, S. Izadi, L. Itti","doi":"10.1109/CVPR.2016.244","DOIUrl":"https://doi.org/10.1109/CVPR.2016.244","url":null,"abstract":"Tolerance to image variations (e.g., translation, scale, pose, illumination, background) is an important desired property of any object recognition system, be it human or machine. Moving towards increasingly bigger datasets has been trending in computer vision especially with the emergence of highly popular deep learning models. While being very useful for learning invariance to object inter-and intra-class shape variability, these large-scale wild datasets are not very useful for learning invariance to other parameters urging researchers to resort to other tricks for training models. In this work, we introduce a large-scale synthetic dataset, which is freely and publicly available, and use it to answer several fundamental questions regarding selectivity and invariance properties of convolutional neural networks. Our dataset contains two parts: a) objects shot on a turntable: 15 categories, 8 rotation angles, 11 cameras on a semi-circular arch, 5 lighting conditions, 3 focus levels, variety of backgrounds (23.4 per instance) generating 1320 images per instance (about 22 million images in total), and b) scenes: in which a robotic arm takes pictures of objects on a 1:160 scale scene. We study: 1) invariance and selectivity of different CNN layers, 2) knowledge transfer from one object category to another, 3) systematic or random sampling of images to build a train set, 4) domain adaptation from synthetic to natural scenes, and 5) order of knowledge delivery to CNNs. We also discuss how our analyses can lead the field to develop more efficient deep learning methods.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"24 1","pages":"2221-2230"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84731860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 66

End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation 可变形零件混合的端到端学习与深度卷积神经网络人体姿态估计

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.335

Wei Yang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

Recently, Deep Convolutional Neural Networks (DCNNs) have been applied to the task of human pose estimation, and have shown its potential of learning better feature representations and capturing contextual relationships. However, it is difficult to incorporate domain prior knowledge such as geometric relationships among body parts into DCNNs. In addition, training DCNN-based body part detectors without consideration of global body joint consistency introduces ambiguities, which increases the complexity of training. In this paper, we propose a novel end-to-end framework for human pose estimation that combines DCNNs with the expressive deformable mixture of parts. We explicitly incorporate domain prior knowledge into the framework, which greatly regularizes the learning process and enables the flexibility of our framework for loopy models or tree-structured models. The effectiveness of jointly learning a DCNN with a deformable mixture of parts model is evaluated through intensive experiments on several widely used benchmarks. The proposed approach significantly improves the performance compared with state-of-the-art approaches, especially on benchmarks with challenging articulations.

最近，深度卷积神经网络(Deep Convolutional Neural Networks, DCNNs)已被应用于人体姿态估计任务，并显示出其在学习更好的特征表示和捕获上下文关系方面的潜力。然而，将身体各部位之间的几何关系等领域先验知识整合到DCNNs中是很困难的。此外，训练基于dcnn的身体部位检测器而不考虑整体身体关节一致性会引入歧义，这增加了训练的复杂性。在本文中，我们提出了一种新的端到端人体姿态估计框架，该框架将DCNNs与具有表达性的可变形混合部分相结合。我们明确地将领域先验知识合并到框架中，这极大地规范了学习过程，并使我们的框架能够灵活地用于循环模型或树状结构模型。通过在几个广泛使用的基准上进行大量实验，评估了联合学习具有可变形零件混合模型的DCNN的有效性。与最先进的方法相比，所提出的方法显着提高了性能，特别是在具有挑战性发音的基准测试中。

{"title":"End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation","authors":"Wei Yang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang","doi":"10.1109/CVPR.2016.335","DOIUrl":"https://doi.org/10.1109/CVPR.2016.335","url":null,"abstract":"Recently, Deep Convolutional Neural Networks (DCNNs) have been applied to the task of human pose estimation, and have shown its potential of learning better feature representations and capturing contextual relationships. However, it is difficult to incorporate domain prior knowledge such as geometric relationships among body parts into DCNNs. In addition, training DCNN-based body part detectors without consideration of global body joint consistency introduces ambiguities, which increases the complexity of training. In this paper, we propose a novel end-to-end framework for human pose estimation that combines DCNNs with the expressive deformable mixture of parts. We explicitly incorporate domain prior knowledge into the framework, which greatly regularizes the learning process and enables the flexibility of our framework for loopy models or tree-structured models. The effectiveness of jointly learning a DCNN with a deformable mixture of parts model is evaluated through intensive experiments on several widely used benchmarks. The proposed approach significantly improves the performance compared with state-of-the-art approaches, especially on benchmarks with challenging articulations.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"75 1","pages":"3073-3082"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75702107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 233

Visual Path Prediction in Complex Scenes with Crowded Moving Objects 具有拥挤运动物体的复杂场景的视觉路径预测

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.292

Y. Yoo, Kimin Yun, Sangdoo Yun, Jonghee Hong, Hawook Jeong, J. Choi

This paper proposes a novel path prediction algorithm for progressing one step further than the existing works focusing on single target path prediction. In this paper, we consider moving dynamics of co-occurring objects for path prediction in a scene that includes crowded moving objects. To solve this problem, we first suggest a two-layered probabilistic model to find major movement patterns and their cooccurrence tendency. By utilizing the unsupervised learning results from the model, we present an algorithm to find the future location of any target object. Through extensive qualitative/quantitative experiments, we show that our algorithm can find a plausible future path in complex scenes with a large number of moving objects.

本文提出了一种新的路径预测算法，比现有的单目标路径预测算法进步了一步。在本文中，我们考虑了在包含拥挤的运动物体的场景中，共同发生物体的运动动力学来进行路径预测。为了解决这个问题，我们首先提出了一个双层概率模型来寻找主要的运动模式和它们的共同发生趋势。利用该模型的无监督学习结果，我们提出了一种寻找任意目标物体未来位置的算法。通过大量的定性/定量实验，我们证明了我们的算法可以在具有大量运动物体的复杂场景中找到一个合理的未来路径。

引用次数: 34

HD Maps: Fine-Grained Road Segmentation by Parsing Ground and Aerial Images 高清地图:通过解析地面和航空图像进行细粒度道路分割

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.393

G. Máttyus, Shenlong Wang, S. Fidler, R. Urtasun

In this paper we present an approach to enhance existing maps with fine grained segmentation categories such as parking spots and sidewalk, as well as the number and location of road lanes. Towards this goal, we propose an efficient approach that is able to estimate these fine grained categories by doing joint inference over both, monocular aerial imagery, as well as ground images taken from a stereo camera pair mounted on top of a car. Important to this is reasoning about the alignment between the two types of imagery, as even when the measurements are taken with sophisticated GPS+IMU systems, this alignment is not sufficiently accurate. We demonstrate the effectiveness of our approach on a new dataset which enhances KITTI [8] with aerial images taken with a camera mounted on an airplane and flying around the city of Karlsruhe, Germany.

在本文中，我们提出了一种方法来增强现有地图的细粒度分割类别，如停车位和人行道，以及道路车道的数量和位置。为了实现这一目标，我们提出了一种有效的方法，能够通过对单眼航空图像以及安装在车顶上的立体相机对拍摄的地面图像进行联合推理来估计这些细粒度的类别。重要的是对两种图像之间的对齐进行推理，因为即使使用复杂的GPS+IMU系统进行测量，这种对齐也不够准确。我们在一个新的数据集上展示了我们的方法的有效性，该数据集增强了KITTI[8]，该数据集使用安装在飞机上的相机拍摄的航拍图像，并在德国卡尔斯鲁厄市飞行。

引用次数: 131

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀