首页 > 最新文献

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition最新文献

英文 中文
Learning Answer Embeddings for Visual Question Answering 学习视觉问题回答的答案嵌入
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00569
Hexiang Hu, Wei-Lun Chao, Fei Sha
We propose a novel probabilistic model for visual question answering (Visual QA). The key idea is to infer two sets of embeddings: one for the image and the question jointly and the other for the answers. The learning objective is to learn the best parameterization of those embeddings such that the correct answer has higher likelihood among all possible answers. In contrast to several existing approaches of treating Visual QA as multi-way classification, the proposed approach takes the semantic relationships (as characterized by the embeddings) among answers into consideration, instead of viewing them as independent ordinal numbers. Thus, the learned embedded function can be used to embed unseen answers (in the training dataset). These properties make the approach particularly appealing for transfer learning for open-ended Visual QA, where the source dataset on which the model is learned has limited overlapping with the target dataset in the space of answers. We have also developed large-scale optimization techniques for applying the model to datasets with a large number of answers, where the challenge is to properly normalize the proposed probabilistic models. We validate our approach on several Visual QA datasets and investigate its utility for transferring models across datasets. The empirical results have shown that the approach performs well not only on in-domain learning but also on transfer learning.
我们提出了一种新的视觉问答(visual QA)概率模型。关键思想是推断两组嵌入:一组用于图像和问题,另一组用于答案。学习目标是学习这些嵌入的最佳参数化,使正确答案在所有可能的答案中具有更高的可能性。与将Visual QA视为多方向分类的几种现有方法相反,所提出的方法考虑了答案之间的语义关系(以嵌入为特征),而不是将它们视为独立的序数。因此,学习到的嵌入函数可以用来嵌入未见过的答案(在训练数据集中)。这些属性使得该方法对开放式Visual QA的迁移学习特别有吸引力,其中学习模型的源数据集在答案空间中与目标数据集的重叠有限。我们还开发了大规模优化技术,用于将模型应用于具有大量答案的数据集,其中的挑战是正确地规范化所提出的概率模型。我们在几个Visual QA数据集上验证了我们的方法,并研究了它在跨数据集传输模型的实用性。实证结果表明,该方法不仅在领域内学习方面表现良好,而且在迁移学习方面也表现良好。
{"title":"Learning Answer Embeddings for Visual Question Answering","authors":"Hexiang Hu, Wei-Lun Chao, Fei Sha","doi":"10.1109/CVPR.2018.00569","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00569","url":null,"abstract":"We propose a novel probabilistic model for visual question answering (Visual QA). The key idea is to infer two sets of embeddings: one for the image and the question jointly and the other for the answers. The learning objective is to learn the best parameterization of those embeddings such that the correct answer has higher likelihood among all possible answers. In contrast to several existing approaches of treating Visual QA as multi-way classification, the proposed approach takes the semantic relationships (as characterized by the embeddings) among answers into consideration, instead of viewing them as independent ordinal numbers. Thus, the learned embedded function can be used to embed unseen answers (in the training dataset). These properties make the approach particularly appealing for transfer learning for open-ended Visual QA, where the source dataset on which the model is learned has limited overlapping with the target dataset in the space of answers. We have also developed large-scale optimization techniques for applying the model to datasets with a large number of answers, where the challenge is to properly normalize the proposed probabilistic models. We validate our approach on several Visual QA datasets and investigate its utility for transferring models across datasets. The empirical results have shown that the approach performs well not only on in-domain learning but also on transfer learning.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"36 1","pages":"5428-5436"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88400297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
PoTion: Pose MoTion Representation for Action Recognition 药水:动作识别的姿势动作表现
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00734
Vasileios Choutas, Philippe Weinzaepfel, Jérôme Revaud, C. Schmid
Most state-of-the-art methods for action recognition rely on a two-stream architecture that processes appearance and motion independently. In this paper, we claim that considering them jointly offers rich information for action recognition. We introduce a novel representation that gracefully encodes the movement of some semantic keypoints. We use the human joints as these keypoints and term our Pose moTion representation PoTion. Specifically, we first run a state-of-the-art human pose estimator [4] and extract heatmaps for the human joints in each frame. We obtain our PoTion representation by temporally aggregating these probability maps. This is achieved by 'colorizing' each of them depending on the relative time of the frames in the video clip and summing them. This fixed-size representation for an entire video clip is suitable to classify actions using a shallow convolutional neural network. Our experimental evaluation shows that PoTion outperforms other state-of-the-art pose representations [6, 48]. Furthermore, it is complementary to standard appearance and motion streams. When combining PoTion with the recent two-stream I3D approach [5], we obtain state-of-the-art performance on the JHMDB, HMDB and UCF101 datasets.
大多数最先进的动作识别方法依赖于独立处理外观和动作的双流架构。在本文中,我们声称将它们联合考虑为动作识别提供了丰富的信息。我们引入了一种新的表示,它优雅地编码了一些语义关键点的运动。我们使用人体关节作为这些关键点,并将其称为姿态运动表示药水。具体来说,我们首先运行一个最先进的人体姿势估计器[4],并在每帧中提取人体关节的热图。我们通过暂时聚合这些概率图来获得我们的药剂表示。这是通过根据视频剪辑中帧的相对时间对每个帧进行“着色”并将它们相加来实现的。整个视频片段的固定大小表示适合使用浅卷积神经网络对动作进行分类。我们的实验评估表明,PoTion优于其他最先进的姿势表示[6,48]。此外,它是标准外观和运动流的补充。当将PoTion与最近的双流I3D方法[5]相结合时,我们在JHMDB、HMDB和UCF101数据集上获得了最先进的性能。
{"title":"PoTion: Pose MoTion Representation for Action Recognition","authors":"Vasileios Choutas, Philippe Weinzaepfel, Jérôme Revaud, C. Schmid","doi":"10.1109/CVPR.2018.00734","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00734","url":null,"abstract":"Most state-of-the-art methods for action recognition rely on a two-stream architecture that processes appearance and motion independently. In this paper, we claim that considering them jointly offers rich information for action recognition. We introduce a novel representation that gracefully encodes the movement of some semantic keypoints. We use the human joints as these keypoints and term our Pose moTion representation PoTion. Specifically, we first run a state-of-the-art human pose estimator [4] and extract heatmaps for the human joints in each frame. We obtain our PoTion representation by temporally aggregating these probability maps. This is achieved by 'colorizing' each of them depending on the relative time of the frames in the video clip and summing them. This fixed-size representation for an entire video clip is suitable to classify actions using a shallow convolutional neural network. Our experimental evaluation shows that PoTion outperforms other state-of-the-art pose representations [6, 48]. Furthermore, it is complementary to standard appearance and motion streams. When combining PoTion with the recent two-stream I3D approach [5], we obtain state-of-the-art performance on the JHMDB, HMDB and UCF101 datasets.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"15 1","pages":"7024-7033"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82864839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 242
Fast and Robust Estimation for Unit-Norm Constrained Linear Fitting Problems 单位范数约束线性拟合问题的快速鲁棒估计
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00850
Daiki Ikami, T. Yamasaki, K. Aizawa
M-estimator using iteratively reweighted least squares (IRLS) is one of the best-known methods for robust estimation. However, IRLS is ineffective for robust unit-norm constrained linear fitting (UCLF) problems, such as fundamental matrix estimation because of a poor initial solution. We overcome this problem by developing a novel objective function and its optimization, named iteratively reweighted eigenvalues minimization (IREM). IREM is guaranteed to decrease the objective function and achieves fast convergence and high robustness. In robust fundamental matrix estimation, IREM performs approximately 5-500 times faster than random sampling consensus (RANSAC) while preserving comparable or superior robustness.
使用迭代加权最小二乘(IRLS)的m估计是最著名的鲁棒估计方法之一。然而,IRLS对于鲁棒单位范数约束线性拟合(UCLF)问题,如基本矩阵估计,由于初始解较差,是无效的。为了克服这一问题,我们提出了一种新的目标函数及其优化方法,称为迭代重加权特征值最小化(IREM)。该方法保证了目标函数的减小,实现了快速收敛和高鲁棒性。在稳健的基本矩阵估计中,IREM的执行速度比随机抽样共识(RANSAC)快约5-500倍,同时保持相当或更好的鲁棒性。
{"title":"Fast and Robust Estimation for Unit-Norm Constrained Linear Fitting Problems","authors":"Daiki Ikami, T. Yamasaki, K. Aizawa","doi":"10.1109/CVPR.2018.00850","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00850","url":null,"abstract":"M-estimator using iteratively reweighted least squares (IRLS) is one of the best-known methods for robust estimation. However, IRLS is ineffective for robust unit-norm constrained linear fitting (UCLF) problems, such as fundamental matrix estimation because of a poor initial solution. We overcome this problem by developing a novel objective function and its optimization, named iteratively reweighted eigenvalues minimization (IREM). IREM is guaranteed to decrease the objective function and achieves fast convergence and high robustness. In robust fundamental matrix estimation, IREM performs approximately 5-500 times faster than random sampling consensus (RANSAC) while preserving comparable or superior robustness.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"57 1","pages":"8147-8155"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82842822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
3D Object Detection with Latent Support Surfaces 具有潜在支持面的3D对象检测
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00104
Zhile Ren, Erik B. Sudderth
We develop a 3D object detection algorithm that uses latent support surfaces to capture contextual relationships in indoor scenes. Existing 3D representations for RGB-D images capture the local shape and appearance of object categories, but have limited power to represent objects with different visual styles. The detection of small objects is also challenging because the search space is very large in 3D scenes. However, we observe that much of the shape variation within 3D object categories can be explained by the location of a latent support surface, and smaller objects are often supported by larger objects. Therefore, we explicitly use latent support surfaces to better represent the 3D appearance of large objects, and provide contextual cues to improve the detection of small objects. We evaluate our model with 19 object categories from the SUN RGB-D database, and demonstrate state-of-the-art performance.
我们开发了一种3D物体检测算法,该算法使用潜在的支持面来捕捉室内场景中的上下文关系。RGB-D图像的现有3D表示捕获对象类别的局部形状和外观,但在表示具有不同视觉风格的对象方面能力有限。小物体的检测也很有挑战性,因为在3D场景中搜索空间非常大。然而,我们观察到,3D物体类别中的许多形状变化可以通过潜在支撑面的位置来解释,并且较小的物体通常由较大的物体支撑。因此,我们明确地使用潜在支持面来更好地表示大物体的3D外观,并提供上下文线索来提高对小物体的检测。我们用SUN RGB-D数据库中的19个对象类别来评估我们的模型,并展示了最先进的性能。
{"title":"3D Object Detection with Latent Support Surfaces","authors":"Zhile Ren, Erik B. Sudderth","doi":"10.1109/CVPR.2018.00104","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00104","url":null,"abstract":"We develop a 3D object detection algorithm that uses latent support surfaces to capture contextual relationships in indoor scenes. Existing 3D representations for RGB-D images capture the local shape and appearance of object categories, but have limited power to represent objects with different visual styles. The detection of small objects is also challenging because the search space is very large in 3D scenes. However, we observe that much of the shape variation within 3D object categories can be explained by the location of a latent support surface, and smaller objects are often supported by larger objects. Therefore, we explicitly use latent support surfaces to better represent the 3D appearance of large objects, and provide contextual cues to improve the detection of small objects. We evaluate our model with 19 object categories from the SUN RGB-D database, and demonstrate state-of-the-art performance.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"937-946"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88823195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A Deeper Look at Power Normalizations 更深入地了解权力正常化
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00605
Piotr Koniusz, Hongguang Zhang, F. Porikli
Power Normalizations (PN) are very useful non-linear operators in the context of Bag-of-Words data representations as they tackle problems such as feature imbalance. In this paper, we reconsider these operators in the deep learning setup by introducing a novel layer that implements PN for non-linear pooling of feature maps. Specifically, by using a kernel formulation, our layer combines the feature vectors and their respective spatial locations in the feature maps produced by the last convolutional layer of CNN. Linearization of such a kernel results in a positive definite matrix capturing the second-order statistics of the feature vectors, to which PN operators are applied. We study two types of PN functions, namely (i) MaxExp and (ii) Gamma, addressing their role and meaning in the context of nonlinear pooling. We also provide a probabilistic interpretation of these operators and derive their surrogates with well-behaved gradients for end-to-end CNN learning. We apply our theory to practice by implementing the PN layer on a ResNet-50 model and showcase experiments on four benchmarks for fine-grained recognition, scene recognition, and material classification. Our results demonstrate state-of-the-part performance across all these tasks.
幂规格化(PN)在词袋数据表示中是非常有用的非线性运算符,因为它们可以解决诸如特征不平衡之类的问题。在本文中,我们通过引入一个新的层来重新考虑深度学习设置中的这些算子,该层实现了特征映射的非线性池化的PN。具体来说,通过使用核公式,我们的层结合了CNN最后一个卷积层生成的特征映射中的特征向量及其各自的空间位置。这种核的线性化结果是一个正定矩阵,它捕获了特征向量的二阶统计量,并应用了PN算子。我们研究了两种类型的PN函数,即(i) MaxExp和(ii) Gamma,讨论了它们在非线性池化中的作用和意义。我们还提供了这些算子的概率解释,并推导出具有良好行为梯度的代理,用于端到端CNN学习。我们通过在ResNet-50模型上实现PN层,并在细粒度识别、场景识别和材料分类的四个基准上展示实验,将我们的理论应用于实践。我们的结果展示了跨所有这些任务的部分性能。
{"title":"A Deeper Look at Power Normalizations","authors":"Piotr Koniusz, Hongguang Zhang, F. Porikli","doi":"10.1109/CVPR.2018.00605","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00605","url":null,"abstract":"Power Normalizations (PN) are very useful non-linear operators in the context of Bag-of-Words data representations as they tackle problems such as feature imbalance. In this paper, we reconsider these operators in the deep learning setup by introducing a novel layer that implements PN for non-linear pooling of feature maps. Specifically, by using a kernel formulation, our layer combines the feature vectors and their respective spatial locations in the feature maps produced by the last convolutional layer of CNN. Linearization of such a kernel results in a positive definite matrix capturing the second-order statistics of the feature vectors, to which PN operators are applied. We study two types of PN functions, namely (i) MaxExp and (ii) Gamma, addressing their role and meaning in the context of nonlinear pooling. We also provide a probabilistic interpretation of these operators and derive their surrogates with well-behaved gradients for end-to-end CNN learning. We apply our theory to practice by implementing the PN layer on a ResNet-50 model and showcase experiments on four benchmarks for fine-grained recognition, scene recognition, and material classification. Our results demonstrate state-of-the-part performance across all these tasks.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"6 1","pages":"5774-5783"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88869228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Inverse Composition Discriminative Optimization for Point Cloud Registration 点云配准的逆组合判别优化
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00316
J. Vongkulbhisal, Beñat Irastorza Ugalde, F. D. L. Torre, J. Costeira
Rigid Point Cloud Registration (PCReg) refers to the problem of finding the rigid transformation between two sets of point clouds. This problem is particularly important due to the advances in new 3D sensing hardware, and it is challenging because neither the correspondence nor the transformation parameters are known. Traditional local PCReg methods (e.g., ICP) rely on local optimization algorithms, which can get trapped in bad local minima in the presence of noise, outliers, bad initializations, etc. To alleviate these issues, this paper proposes Inverse Composition Discriminative Optimization (ICDO), an extension of Discriminative Optimization (DO), which learns a sequence of update steps from synthetic training data that search the parameter space for an improved solution. Unlike DO, ICDO is object-independent and generalizes even to unseen shapes. We evaluated ICDO on both synthetic and real data, and show that ICDO can match the speed and outperform the accuracy of state-of-the-art PCReg algorithms.
刚体点云配准(PCReg)是指寻找两组点云之间刚体变换的问题。由于新型3D传感硬件的进步,这个问题尤为重要,而且由于既不知道对应关系,也不知道转换参数,所以这个问题具有挑战性。传统的局部PCReg方法(例如ICP)依赖于局部优化算法,在存在噪声、离群值、不良初始化等情况下,可能会陷入不良的局部最小值。为了缓解这些问题,本文提出了逆组合判别优化(ICDO),这是判别优化(DO)的扩展,它从综合训练数据中学习一系列更新步骤,搜索参数空间以寻找改进的解。与DO不同,ICDO是对象独立的,甚至可以泛化到看不见的形状。我们在合成数据和真实数据上对ICDO进行了评估,结果表明ICDO可以匹配最先进的PCReg算法的速度和准确性。
{"title":"Inverse Composition Discriminative Optimization for Point Cloud Registration","authors":"J. Vongkulbhisal, Beñat Irastorza Ugalde, F. D. L. Torre, J. Costeira","doi":"10.1109/CVPR.2018.00316","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00316","url":null,"abstract":"Rigid Point Cloud Registration (PCReg) refers to the problem of finding the rigid transformation between two sets of point clouds. This problem is particularly important due to the advances in new 3D sensing hardware, and it is challenging because neither the correspondence nor the transformation parameters are known. Traditional local PCReg methods (e.g., ICP) rely on local optimization algorithms, which can get trapped in bad local minima in the presence of noise, outliers, bad initializations, etc. To alleviate these issues, this paper proposes Inverse Composition Discriminative Optimization (ICDO), an extension of Discriminative Optimization (DO), which learns a sequence of update steps from synthetic training data that search the parameter space for an improved solution. Unlike DO, ICDO is object-independent and generalizes even to unseen shapes. We evaluated ICDO on both synthetic and real data, and show that ICDO can match the speed and outperform the accuracy of state-of-the-art PCReg algorithms.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"60 1","pages":"2993-3001"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84630422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Soccer on Your Tabletop 桌上足球
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00498
Konstantinos Rematas, Ira Kemelmacher-Shlizerman, B. Curless, S. Seitz
We present a system that transforms a monocular video of a soccer game into a moving 3D reconstruction, in which the players and field can be rendered interactively with a 3D viewer or through an Augmented Reality device. At the heart of our paper is an approach to estimate the depth map of each player, using a CNN that is trained on 3D player data extracted from soccer video games. We compare with state of the art body pose and depth estimation techniques, and show results on both synthetic ground truth benchmarks, and real YouTube soccer footage.
我们提出了一个将足球比赛的单目视频转换为移动3D重建的系统,其中球员和场地可以通过3D观看器或增强现实设备进行交互渲染。我们论文的核心是一种估计每个球员深度图的方法,该方法使用从足球视频游戏中提取的3D球员数据进行训练的CNN。我们与最先进的身体姿势和深度估计技术进行比较,并在合成的地面真实基准和真实的YouTube足球镜头上显示结果。
{"title":"Soccer on Your Tabletop","authors":"Konstantinos Rematas, Ira Kemelmacher-Shlizerman, B. Curless, S. Seitz","doi":"10.1109/CVPR.2018.00498","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00498","url":null,"abstract":"We present a system that transforms a monocular video of a soccer game into a moving 3D reconstruction, in which the players and field can be rendered interactively with a 3D viewer or through an Augmented Reality device. At the heart of our paper is an approach to estimate the depth map of each player, using a CNN that is trained on 3D player data extracted from soccer video games. We compare with state of the art body pose and depth estimation techniques, and show results on both synthetic ground truth benchmarks, and real YouTube soccer footage.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"10 1","pages":"4738-4747"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84641375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Depth and Transient Imaging with Compressive SPAD Array Cameras 压缩SPAD阵列相机的深度和瞬态成像
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00036
Qilin Sun, Xiong Dun, Yifan Peng, W. Heidrich
Time-of-flight depth imaging and transient imaging are two imaging modalities that have recently received a lot of interest. Despite much research, existing hardware systems are limited either in terms of temporal resolution or are prohibitively expensive. Arrays of Single Photon Avalanche Diodes (SPADs) promise to fill this gap by providing higher temporal resolution at an affordable cost. Unfortunately SPAD arrays are to date only available in relatively small resolutions. In this work we aim to overcome the spatial resolution limit of SPAD arrays by employing a compressive sensing camera design. Using a DMD and custom optics, we achieve an image resolution of up to 800×400 on SPAD Arrays of resolution 64×32. Using our new data fitting model for the time histograms, we suppress the noise while abstracting the phase and amplitude information, so as to realize a temporal resolution of a few tens of picoseconds.
飞行时间深度成像和瞬态成像是近年来备受关注的两种成像方式。尽管进行了大量的研究,但现有的硬件系统要么在时间分辨率方面受到限制,要么过于昂贵。单光子雪崩二极管阵列(spad)有望以可承受的成本提供更高的时间分辨率来填补这一空白。不幸的是,SPAD阵列到目前为止只能在相对较小的分辨率下使用。在这项工作中,我们的目标是通过采用压缩感知相机设计来克服SPAD阵列的空间分辨率限制。使用DMD和自定义光学器件,我们在分辨率为64Ã-32的SPAD阵列上实现了高达800Ã-400的图像分辨率。利用我们的时间直方图数据拟合模型,在提取相位和幅度信息的同时抑制噪声,从而实现几十皮秒的时间分辨率。
{"title":"Depth and Transient Imaging with Compressive SPAD Array Cameras","authors":"Qilin Sun, Xiong Dun, Yifan Peng, W. Heidrich","doi":"10.1109/CVPR.2018.00036","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00036","url":null,"abstract":"Time-of-flight depth imaging and transient imaging are two imaging modalities that have recently received a lot of interest. Despite much research, existing hardware systems are limited either in terms of temporal resolution or are prohibitively expensive. Arrays of Single Photon Avalanche Diodes (SPADs) promise to fill this gap by providing higher temporal resolution at an affordable cost. Unfortunately SPAD arrays are to date only available in relatively small resolutions. In this work we aim to overcome the spatial resolution limit of SPAD arrays by employing a compressive sensing camera design. Using a DMD and custom optics, we achieve an image resolution of up to 800×400 on SPAD Arrays of resolution 64×32. Using our new data fitting model for the time histograms, we suppress the noise while abstracting the phase and amplitude information, so as to realize a temporal resolution of a few tens of picoseconds.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"66 1","pages":"273-282"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88608364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Tensorize, Factorize and Regularize: Robust Visual Relationship Learning 张化、因式分解和正则化:稳健的视觉关系学习
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00112
Seong Jae Hwang, Sathya Ravi, Zirui Tao, Hyunwoo J. Kim, Maxwell D. Collins, Vikas Singh
Visual relationships provide higher-level information of objects and their relations in an image - this enables a semantic understanding of the scene and helps downstream applications. Given a set of localized objects in some training data, visual relationship detection seeks to detect the most likely "relationship" between objects in a given image. While the specific objects may be well represented in training data, their relationships may still be infrequent. The empirical distribution obtained from seeing these relationships in a dataset does not model the underlying distribution well - a serious issue for most learning methods. In this work, we start from a simple multi-relational learning model, which in principle, offers a rich formalization for deriving a strong prior for learning visual relationships. While the inference problem for deriving the regularizer is challenging, our main technical contribution is to show how adapting recent results in numerical linear algebra lead to efficient algorithms for a factorization scheme that yields highly informative priors. The factorization provides sample size bounds for inference (under mild conditions) for the underlying [object, predicate, object] relationship learning task on its own and surprisingly outperforms (in some cases) existing methods even without utilizing visual features. Then, when integrated with an end-to-end architecture for visual relationship detection leveraging image data, we substantially improve the state-of-the-art.
视觉关系提供了图像中对象及其关系的高级信息——这使得对场景的语义理解成为可能,并有助于下游应用程序。给定一些训练数据中的一组局部对象,视觉关系检测试图检测给定图像中对象之间最可能的“关系”。虽然特定对象可能在训练数据中得到很好的表示,但它们之间的关系可能仍然不频繁。从数据集中观察这些关系获得的经验分布不能很好地模拟底层分布——这是大多数学习方法面临的一个严重问题。在这项工作中,我们从一个简单的多关系学习模型开始,原则上,它为获得学习视觉关系的强先验提供了丰富的形式化。虽然推导正则化器的推理问题具有挑战性,但我们的主要技术贡献是展示了如何适应数值线性代数中的最新结果,从而为产生高信息量先验的分解方案提供有效的算法。分解本身为底层[对象,谓词,对象]关系学习任务提供了推断(在温和条件下)的样本大小界限,并且令人惊讶地优于(在某些情况下)现有方法,即使不使用视觉特征。然后,当与利用图像数据的视觉关系检测的端到端架构集成时,我们大大提高了最先进的技术。
{"title":"Tensorize, Factorize and Regularize: Robust Visual Relationship Learning","authors":"Seong Jae Hwang, Sathya Ravi, Zirui Tao, Hyunwoo J. Kim, Maxwell D. Collins, Vikas Singh","doi":"10.1109/CVPR.2018.00112","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00112","url":null,"abstract":"Visual relationships provide higher-level information of objects and their relations in an image - this enables a semantic understanding of the scene and helps downstream applications. Given a set of localized objects in some training data, visual relationship detection seeks to detect the most likely \"relationship\" between objects in a given image. While the specific objects may be well represented in training data, their relationships may still be infrequent. The empirical distribution obtained from seeing these relationships in a dataset does not model the underlying distribution well - a serious issue for most learning methods. In this work, we start from a simple multi-relational learning model, which in principle, offers a rich formalization for deriving a strong prior for learning visual relationships. While the inference problem for deriving the regularizer is challenging, our main technical contribution is to show how adapting recent results in numerical linear algebra lead to efficient algorithms for a factorization scheme that yields highly informative priors. The factorization provides sample size bounds for inference (under mild conditions) for the underlying [object, predicate, object] relationship learning task on its own and surprisingly outperforms (in some cases) existing methods even without utilizing visual features. Then, when integrated with an end-to-end architecture for visual relationship detection leveraging image data, we substantially improve the state-of-the-art.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"146 1","pages":"1014-1023"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88636899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Encoding Crowd Interaction with Deep Neural Network for Pedestrian Trajectory Prediction 基于深度神经网络的人群交互编码行人轨迹预测
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00553
Yanyu Xu, Zhixin Piao, Shenghua Gao
Pedestrian trajectory prediction is a challenging task because of the complex nature of humans. In this paper, we tackle the problem within a deep learning framework by considering motion information of each pedestrian and its interaction with the crowd. Specifically, motivated by the residual learning in deep learning, we propose to predict displacement between neighboring frames for each pedestrian sequentially. To predict such displacement, we design a crowd interaction deep neural network (CIDNN) which considers the different importance of different pedestrians for the displacement prediction of a target pedestrian. Specifically, we use an LSTM to model motion information for all pedestrians and use a multi-layer perceptron to map the location of each pedestrian to a high dimensional feature space where the inner product between features is used as a measurement for the spatial affinity between two pedestrians. Then we weight the motion features of all pedestrians based on their spatial affinity to the target pedestrian for location displacement prediction. Extensive experiments on publicly available datasets validate the effectiveness of our method for trajectory prediction.
由于人类的复杂性,行人轨迹预测是一项具有挑战性的任务。在本文中,我们通过考虑每个行人的运动信息及其与人群的交互,在深度学习框架中解决了这个问题。具体来说,在深度学习中残差学习的激励下,我们提出了顺序预测每个行人相邻帧之间的位移。为了预测这种位移,我们设计了一个人群交互深度神经网络(CIDNN),该网络考虑了不同行人对目标行人位移预测的不同重要性。具体来说,我们使用LSTM对所有行人的运动信息进行建模,并使用多层感知器将每个行人的位置映射到高维特征空间,其中特征之间的内积用于测量两个行人之间的空间亲和力。然后根据行人与目标行人的空间亲和性对所有行人的运动特征进行加权,进行位置位移预测。在公开可用的数据集上进行的大量实验验证了我们的方法在轨迹预测方面的有效性。
{"title":"Encoding Crowd Interaction with Deep Neural Network for Pedestrian Trajectory Prediction","authors":"Yanyu Xu, Zhixin Piao, Shenghua Gao","doi":"10.1109/CVPR.2018.00553","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00553","url":null,"abstract":"Pedestrian trajectory prediction is a challenging task because of the complex nature of humans. In this paper, we tackle the problem within a deep learning framework by considering motion information of each pedestrian and its interaction with the crowd. Specifically, motivated by the residual learning in deep learning, we propose to predict displacement between neighboring frames for each pedestrian sequentially. To predict such displacement, we design a crowd interaction deep neural network (CIDNN) which considers the different importance of different pedestrians for the displacement prediction of a target pedestrian. Specifically, we use an LSTM to model motion information for all pedestrians and use a multi-layer perceptron to map the location of each pedestrian to a high dimensional feature space where the inner product between features is used as a measurement for the spatial affinity between two pedestrians. Then we weight the motion features of all pedestrians based on their spatial affinity to the target pedestrian for location displacement prediction. Extensive experiments on publicly available datasets validate the effectiveness of our method for trajectory prediction.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"4 1","pages":"5275-5284"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89367627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 207
期刊
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1