首页 > 最新文献

2019 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

英文 中文
Automatic and Robust Skull Registration Based on Discrete Uniformization 基于离散均匀化的自动鲁棒颅骨配准
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00052
Junli Zhao, Xin Qi, Chengfeng Wen, Na Lei, X. Gu
Skull registration plays a fundamental role in forensic science and is crucial for craniofacial reconstruction. The complicated topology, lack of anatomical features, and low quality reconstructed mesh make skull registration challenging. In this work, we propose an automatic skull registration method based on the discrete uniformization theory, which can handle complicated topologies and is robust to low quality meshes. We apply dynamic Yamabe flow to realize discrete uniformization, which modifies the mesh combinatorial structure during the flow and conformally maps the multiply connected skull surface onto a planar disk with circular holes. The 3D surfaces can be registered by matching their planar images using harmonic maps. This method is rigorous with theoretic guarantee, automatic without user intervention, and robust to low mesh quality. Our experimental results demonstrate the efficiency and efficacy of the method.
颅骨配准在法医科学中起着重要的作用,对颅面重建至关重要。复杂的拓扑结构、缺乏解剖学特征和低质量的重建网格使得颅骨配准具有挑战性。在这项工作中,我们提出了一种基于离散均匀化理论的自动颅骨配准方法,该方法可以处理复杂的拓扑结构,并且对低质量网格具有鲁棒性。采用动态Yamabe流实现离散均匀化,在流动过程中修改网格组合结构,并将多个连接的头骨表面保形映射到具有圆孔的平面圆盘上。三维曲面可以通过调和映射匹配其平面图像来进行配准。该方法具有理论保证严密、自动化、无需用户干预、对低网格质量具有鲁棒性等特点。实验结果证明了该方法的有效性和有效性。
{"title":"Automatic and Robust Skull Registration Based on Discrete Uniformization","authors":"Junli Zhao, Xin Qi, Chengfeng Wen, Na Lei, X. Gu","doi":"10.1109/ICCV.2019.00052","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00052","url":null,"abstract":"Skull registration plays a fundamental role in forensic science and is crucial for craniofacial reconstruction. The complicated topology, lack of anatomical features, and low quality reconstructed mesh make skull registration challenging. In this work, we propose an automatic skull registration method based on the discrete uniformization theory, which can handle complicated topologies and is robust to low quality meshes. We apply dynamic Yamabe flow to realize discrete uniformization, which modifies the mesh combinatorial structure during the flow and conformally maps the multiply connected skull surface onto a planar disk with circular holes. The 3D surfaces can be registered by matching their planar images using harmonic maps. This method is rigorous with theoretic guarantee, automatic without user intervention, and robust to low mesh quality. Our experimental results demonstrate the efficiency and efficacy of the method.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"116 1","pages":"431-440"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80371671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Wavelet Domain Style Transfer for an Effective Perception-Distortion Tradeoff in Single Image Super-Resolution 小波域风格转移在单幅超分辨率图像中有效的感知-失真权衡
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00317
Xin Deng, Ren Yang, Mai Xu, P. Dragotti
In single image super-resolution (SISR), given a low-resolution (LR) image, one wishes to find a high-resolution (HR) version of it which is both accurate and photorealistic. Recently, it has been shown that there exists a fundamental tradeoff between low distortion and high perceptual quality, and the generative adversarial network (GAN) is demonstrated to approach the perception-distortion (PD) bound effectively. In this paper, we propose a novel method based on wavelet domain style transfer (WDST), which achieves a better PD tradeoff than the GAN based methods. Specifically, we propose to use 2D stationary wavelet transform (SWT) to decompose one image into low-frequency and high-frequency sub-bands. For the low-frequency sub-band, we improve its objective quality through an enhancement network. For the high-frequency sub-band, we propose to use WDST to effectively improve its perceptual quality. By feat of the perfect reconstruction property of wavelets, these sub-bands can be re-combined to obtain an image which has simultaneously high objective and perceptual quality. The numerical results on various datasets show that our method achieves the best trade-off between the distortion and perceptual quality among the existing state-of-the-art SISR methods.
在单图像超分辨率(SISR)中,给定低分辨率(LR)图像,人们希望找到它的高分辨率(HR)版本,它既准确又逼真。近年来,已有研究表明,在低失真和高感知质量之间存在着一个基本的权衡,生成对抗网络(GAN)被证明可以有效地接近感知失真(PD)边界。在本文中,我们提出了一种基于小波域风格转移(WDST)的新方法,该方法比基于GAN的方法实现了更好的PD权衡。具体来说,我们提出使用二维平稳小波变换(SWT)将一幅图像分解成低频和高频子带。对于低频子带,我们通过增强网络来提高其客观质量。对于高频子带,我们提出使用WDST来有效提高其感知质量。利用小波的完美重构特性,这些子带可以被重新组合,得到同时具有高客观质量和高感知质量的图像。在各种数据集上的数值结果表明,我们的方法在现有的最先进的SISR方法中实现了失真和感知质量之间的最佳平衡。
{"title":"Wavelet Domain Style Transfer for an Effective Perception-Distortion Tradeoff in Single Image Super-Resolution","authors":"Xin Deng, Ren Yang, Mai Xu, P. Dragotti","doi":"10.1109/ICCV.2019.00317","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00317","url":null,"abstract":"In single image super-resolution (SISR), given a low-resolution (LR) image, one wishes to find a high-resolution (HR) version of it which is both accurate and photorealistic. Recently, it has been shown that there exists a fundamental tradeoff between low distortion and high perceptual quality, and the generative adversarial network (GAN) is demonstrated to approach the perception-distortion (PD) bound effectively. In this paper, we propose a novel method based on wavelet domain style transfer (WDST), which achieves a better PD tradeoff than the GAN based methods. Specifically, we propose to use 2D stationary wavelet transform (SWT) to decompose one image into low-frequency and high-frequency sub-bands. For the low-frequency sub-band, we improve its objective quality through an enhancement network. For the high-frequency sub-band, we propose to use WDST to effectively improve its perceptual quality. By feat of the perfect reconstruction property of wavelets, these sub-bands can be re-combined to obtain an image which has simultaneously high objective and perceptual quality. The numerical results on various datasets show that our method achieves the best trade-off between the distortion and perceptual quality among the existing state-of-the-art SISR methods.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"40 1","pages":"3076-3085"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81548857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
Action Assessment by Joint Relation Graphs 联合关系图的行为评价
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00643
Jiahui Pan, Jibin Gao, Weishi Zheng
We present a new model to assess the performance of actions from videos, through graph-based joint relation modelling. Previous works mainly focused on the whole scene including the performer's body and background, yet they ignored the detailed joint interactions. This is insufficient for fine-grained, accurate action assessment, because the action quality of each joint is dependent of its neighbouring joints. Therefore, we propose to learn the detailed joint motion based on the joint relations. We build trainable Joint Relation Graphs, and analyze joint motion on them. We propose two novel modules, the Joint Commonality Module and the Joint Difference Module, for joint motion learning. The Joint Commonality Module models the general motion for certain body parts, and the Joint Difference Module models the motion differences within body parts. We evaluate our method on six public Olympic actions for performance assessment. Our method outperforms previous approaches (+0.0912) and the whole-scene analysis (+0.0623) in the Spearman's Rank Correlation. We also demonstrate our model's ability to interpret the action assessment process.
我们提出了一种新的模型,通过基于图的联合关系建模来评估视频动作的性能。以往的作品主要关注包括表演者身体和背景在内的整个场景,而忽略了细节的联合互动。这对于细粒度、准确的动作评估是不够的,因为每个关节的动作质量依赖于它的相邻关节。因此,我们提出基于关节关系来学习详细的关节运动。我们建立了可训练的关节关系图,并在其上分析关节运动。我们提出了两个新的模块,关节共性模块和关节差异模块,用于关节运动学习。关节共性模块为某些身体部位的一般运动建模,关节差异模块为身体部位内部的运动差异建模。我们对六项公共奥运行动的绩效评估进行了评估。在Spearman秩相关上,我们的方法优于之前的方法(+0.0912)和全场景分析(+0.0623)。我们还展示了我们的模型解释行动评估过程的能力。
{"title":"Action Assessment by Joint Relation Graphs","authors":"Jiahui Pan, Jibin Gao, Weishi Zheng","doi":"10.1109/ICCV.2019.00643","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00643","url":null,"abstract":"We present a new model to assess the performance of actions from videos, through graph-based joint relation modelling. Previous works mainly focused on the whole scene including the performer's body and background, yet they ignored the detailed joint interactions. This is insufficient for fine-grained, accurate action assessment, because the action quality of each joint is dependent of its neighbouring joints. Therefore, we propose to learn the detailed joint motion based on the joint relations. We build trainable Joint Relation Graphs, and analyze joint motion on them. We propose two novel modules, the Joint Commonality Module and the Joint Difference Module, for joint motion learning. The Joint Commonality Module models the general motion for certain body parts, and the Joint Difference Module models the motion differences within body parts. We evaluate our method on six public Olympic actions for performance assessment. Our method outperforms previous approaches (+0.0912) and the whole-scene analysis (+0.0623) in the Spearman's Rank Correlation. We also demonstrate our model's ability to interpret the action assessment process.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"25 1","pages":"6330-6339"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82447517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations 利用非局部时空相关性的渐进式融合视频超分辨率网络
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00320
Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, Jiayi Ma
Most previous fusion strategies either fail to fully utilize temporal information or cost too much time, and how to effectively fuse temporal information from consecutive frames plays an important role in video super-resolution (SR). In this study, we propose a novel progressive fusion network for video SR, which is designed to make better use of spatio-temporal information and is proved to be more efficient and effective than the existing direct fusion, slow fusion or 3D convolution strategies. Under this progressive fusion framework, we further introduce an improved non-local operation to avoid the complex motion estimation and motion compensation (ME&MC) procedures as in previous video SR approaches. Extensive experiments on public datasets demonstrate that our method surpasses state-of-the-art with 0.96 dB in average, and runs about 3 times faster, while requires only about half of the parameters.
以往的融合策略要么不能充分利用时间信息,要么耗时太长,如何有效地融合连续帧的时间信息是实现视频超分辨率的关键。在这项研究中,我们提出了一种新的视频SR渐进融合网络,旨在更好地利用时空信息,并被证明比现有的直接融合,慢融合或三维卷积策略更高效和有效。在这种渐进式融合框架下,我们进一步引入了一种改进的非局部操作,以避免之前视频SR方法中复杂的运动估计和运动补偿(ME&MC)过程。在公共数据集上的大量实验表明,我们的方法超过了目前最先进的平均0.96 dB,运行速度提高了3倍,而只需要大约一半的参数。
{"title":"Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations","authors":"Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, Jiayi Ma","doi":"10.1109/ICCV.2019.00320","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00320","url":null,"abstract":"Most previous fusion strategies either fail to fully utilize temporal information or cost too much time, and how to effectively fuse temporal information from consecutive frames plays an important role in video super-resolution (SR). In this study, we propose a novel progressive fusion network for video SR, which is designed to make better use of spatio-temporal information and is proved to be more efficient and effective than the existing direct fusion, slow fusion or 3D convolution strategies. Under this progressive fusion framework, we further introduce an improved non-local operation to avoid the complex motion estimation and motion compensation (ME&MC) procedures as in previous video SR approaches. Extensive experiments on public datasets demonstrate that our method surpasses state-of-the-art with 0.96 dB in average, and runs about 3 times faster, while requires only about half of the parameters.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"150 9 1","pages":"3106-3115"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83150376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 176
Adaptive Activation Thresholding: Dynamic Routing Type Behavior for Interpretability in Convolutional Neural Networks 自适应激活阈值:卷积神经网络中可解释性的动态路由类型行为
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00504
Yiyou Sun, Sathya Ravi, Vikas Singh
There is a growing interest in strategies that can help us understand or interpret neural networks -- that is, not merely provide a prediction, but also offer additional context explaining why and how. While many current methods offer tools to perform this analysis for a given (trained) network post-hoc, recent results (especially on capsule networks) suggest that when classes map to a few high level ``concepts'' in the preceding layers of the network, the behavior of the network is easier to interpret or explain. Such training may be accomplished via dynamic/EM routing where the network ``routes'' for individual classes (or subsets of images) are dynamic and involve few nodes even if the full network may not be sparse. In this paper, we show how a simple modification of the SGD scheme can help provide dynamic/EM routing type behavior in convolutional neural networks. Through extensive experiments, we evaluate the effect of this idea for interpretability where we obtain promising results, while also showing that no compromise in attainable accuracy is involved. Further, we show that the minor modification is seemingly ad-hoc, the new algorithm can be analyzed by an approximate method which provably matches known rates for SGD.
人们对能够帮助我们理解或解释神经网络的策略越来越感兴趣——也就是说,不仅提供预测,还提供解释原因和方式的额外背景。虽然许多当前的方法提供了工具来对给定的(训练过的)网络执行这种分析,但最近的结果(特别是在胶囊网络上)表明,当类映射到网络前几层中的一些高级“概念”时,网络的行为更容易解释或解释。这种训练可以通过动态/EM路由完成,其中单个类(或图像子集)的网络“路由”是动态的,即使整个网络可能不是稀疏的,也涉及很少的节点。在本文中,我们展示了SGD方案的简单修改如何有助于在卷积神经网络中提供动态/EM路由类型行为。通过广泛的实验,我们评估了这一想法对可解释性的影响,我们获得了有希望的结果,同时也表明在可达到的准确性方面没有妥协。此外,我们表明,微小的修改似乎是临时的,新算法可以用一种近似方法来分析,该方法可以证明与已知的SGD速率相匹配。
{"title":"Adaptive Activation Thresholding: Dynamic Routing Type Behavior for Interpretability in Convolutional Neural Networks","authors":"Yiyou Sun, Sathya Ravi, Vikas Singh","doi":"10.1109/ICCV.2019.00504","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00504","url":null,"abstract":"There is a growing interest in strategies that can help us understand or interpret neural networks -- that is, not merely provide a prediction, but also offer additional context explaining why and how. While many current methods offer tools to perform this analysis for a given (trained) network post-hoc, recent results (especially on capsule networks) suggest that when classes map to a few high level ``concepts'' in the preceding layers of the network, the behavior of the network is easier to interpret or explain. Such training may be accomplished via dynamic/EM routing where the network ``routes'' for individual classes (or subsets of images) are dynamic and involve few nodes even if the full network may not be sparse. In this paper, we show how a simple modification of the SGD scheme can help provide dynamic/EM routing type behavior in convolutional neural networks. Through extensive experiments, we evaluate the effect of this idea for interpretability where we obtain promising results, while also showing that no compromise in attainable accuracy is involved. Further, we show that the minor modification is seemingly ad-hoc, the new algorithm can be analyzed by an approximate method which provably matches known rates for SGD.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"18 1","pages":"4937-4946"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81281123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Controllable Attention for Structured Layered Video Decomposition 结构化分层视频分解的可控注意力
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00583
Jean-Baptiste Alayrac, João Carreira, Relja Arandjelovi'c, Andrew Zisserman
The objective of this paper is to be able to separate a video into its natural layers, and to control which of the separated layers to attend to. For example, to be able to separate reflections, transparency or object motion. We make the following three contributions: (i) we introduce a new structured neural network architecture that explicitly incorporates layers (as spatial masks) into its design. This improves separation performance over previous general purpose networks for this task; (ii) we demonstrate that we can augment the architecture to leverage external cues such as audio for controllability and to help disambiguation; and (iii) we experimentally demonstrate the effectiveness of our approach and training procedure with controlled experiments while also showing that the proposed model can be successfully applied to real-word applications such as reflection removal and action recognition in cluttered scenes.
本文的目标是能够将视频分离成其自然层,并控制要处理的分离层。例如,能够分离反射、透明或物体运动。我们做出了以下三个贡献:(i)我们引入了一种新的结构化神经网络架构,该架构明确地将层(作为空间掩模)纳入其设计中。这提高了此任务的先前通用网络的分离性能;(ii)我们证明,我们可以增强架构,以利用外部线索,如音频的可控性,并帮助消除歧义;(iii)我们通过实验证明了我们的方法和训练过程的有效性,同时也表明所提出的模型可以成功地应用于现实世界的应用,如反射去除和混乱场景中的动作识别。
{"title":"Controllable Attention for Structured Layered Video Decomposition","authors":"Jean-Baptiste Alayrac, João Carreira, Relja Arandjelovi'c, Andrew Zisserman","doi":"10.1109/ICCV.2019.00583","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00583","url":null,"abstract":"The objective of this paper is to be able to separate a video into its natural layers, and to control which of the separated layers to attend to. For example, to be able to separate reflections, transparency or object motion. We make the following three contributions: (i) we introduce a new structured neural network architecture that explicitly incorporates layers (as spatial masks) into its design. This improves separation performance over previous general purpose networks for this task; (ii) we demonstrate that we can augment the architecture to leverage external cues such as audio for controllability and to help disambiguation; and (iii) we experimentally demonstrate the effectiveness of our approach and training procedure with controlled experiments while also showing that the proposed model can be successfully applied to real-word applications such as reflection removal and action recognition in cluttered scenes.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"9 1","pages":"5733-5742"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81584661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Quasi-Globally Optimal and Efficient Vanishing Point Estimation in Manhattan World 曼哈顿世界的准全局最优有效消失点估计
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00173
Haoang Li, Ji Zhao, J. Bazin, Wen Chen, Zhe Liu, Yunhui Liu
The image lines projected from parallel 3D lines intersect at a common point called the vanishing point (VP). Manhattan world holds for the scenes with three orthogonal VPs. In Manhattan world, given several lines in a calibrated image, we aim at clustering them by three unknown-but-sought VPs. The VP estimation can be reformulated as computing the rotation between the Manhattan frame and the camera frame. To compute this rotation, state-of-the-art methods are based on either data sampling or parameter search, and they fail to guarantee the accuracy and efficiency simultaneously. In contrast, we propose to hybridize these two strategies. We first compute two degrees of freedom (DOF) of the above rotation by two sampled image lines, and then search for the optimal third DOF based on the branch-and-bound. Our sampling accelerates our search by reducing the search space and simplifying the bound computation. Our search is not sensitive to noise and achieves quasi-global optimality in terms of maximizing the number of inliers. Experiments on synthetic and real-world images showed that our method outperforms state-of-the-art approaches in terms of accuracy and/or efficiency.
由平行的3D直线投影的图像线相交于一个称为消失点(VP)的公共点。曼哈顿世界里有三个正交的副总裁。在曼哈顿的世界里,给定一张校准过的图像中的几条线,我们的目标是通过三个未知但寻求的副总裁来对它们进行聚类。VP估计可以重新表述为计算曼哈顿帧和相机帧之间的旋转。为了计算这个旋转,目前的方法要么是基于数据采样,要么是基于参数搜索,它们无法同时保证准确性和效率。相反,我们建议将这两种策略混合起来。首先通过两条采样图像线计算上述旋转的两个自由度,然后基于分支定界算法搜索最优的第三自由度。我们的采样通过减少搜索空间和简化边界计算来加快搜索速度。我们的搜索对噪声不敏感,并且在最大化内层数方面实现了准全局最优性。在合成图像和真实世界图像上的实验表明,我们的方法在准确性和/或效率方面优于最先进的方法。
{"title":"Quasi-Globally Optimal and Efficient Vanishing Point Estimation in Manhattan World","authors":"Haoang Li, Ji Zhao, J. Bazin, Wen Chen, Zhe Liu, Yunhui Liu","doi":"10.1109/ICCV.2019.00173","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00173","url":null,"abstract":"The image lines projected from parallel 3D lines intersect at a common point called the vanishing point (VP). Manhattan world holds for the scenes with three orthogonal VPs. In Manhattan world, given several lines in a calibrated image, we aim at clustering them by three unknown-but-sought VPs. The VP estimation can be reformulated as computing the rotation between the Manhattan frame and the camera frame. To compute this rotation, state-of-the-art methods are based on either data sampling or parameter search, and they fail to guarantee the accuracy and efficiency simultaneously. In contrast, we propose to hybridize these two strategies. We first compute two degrees of freedom (DOF) of the above rotation by two sampled image lines, and then search for the optimal third DOF based on the branch-and-bound. Our sampling accelerates our search by reducing the search space and simplifying the bound computation. Our search is not sensitive to noise and achieves quasi-global optimality in terms of maximizing the number of inliers. Experiments on synthetic and real-world images showed that our method outperforms state-of-the-art approaches in terms of accuracy and/or efficiency.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"17 1","pages":"1646-1654"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81163091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Mixture-Kernel Graph Attention Network for Situation Recognition 态势识别的混合核图注意网络
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.01046
M. Suhail, L. Sigal
Understanding images beyond salient actions involves reasoning about scene context, objects, and the roles they play in the captured event. Situation recognition has recently been introduced as the task of jointly reasoning about the verbs (actions) and a set of semantic-role and entity (noun) pairs in the form of action frames. Labeling an image with an action frame requires an assignment of values (nouns) to the roles based on the observed image content. Among the inherent challenges are the rich conditional structured dependencies between the output role assignments and the overall semantic sparsity. In this paper, we propose a novel mixture-kernel attention graph neural network (GNN) architecture designed to address these challenges. Our GNN enables dynamic graph structure during training and inference, through the use of a graph attention mechanism, and context-aware interactions between role pairs. We illustrate the efficacy of our model and design choices by conducting experiments on imSitu benchmark dataset, with accuracy improvements of up to 10% over the state-of-the-art.
理解突出动作之外的图像需要对场景背景、对象以及它们在捕获事件中扮演的角色进行推理。情境识别是一种以动作框架的形式对动词(动作)和一组语义-角色和实体(名词)对进行联合推理的任务。用动作框架标记图像需要根据观察到的图像内容为角色分配值(名词)。其中固有的挑战是输出角色分配之间的丰富条件结构化依赖关系和整体语义稀疏性。在本文中,我们提出了一种新的混合核注意图神经网络(GNN)架构来解决这些挑战。我们的GNN通过使用图注意机制和角色对之间的上下文感知交互,在训练和推理期间实现动态图结构。我们通过在imSitu基准数据集上进行实验来说明我们的模型和设计选择的有效性,其精度比最先进的精度提高了10%。
{"title":"Mixture-Kernel Graph Attention Network for Situation Recognition","authors":"M. Suhail, L. Sigal","doi":"10.1109/ICCV.2019.01046","DOIUrl":"https://doi.org/10.1109/ICCV.2019.01046","url":null,"abstract":"Understanding images beyond salient actions involves reasoning about scene context, objects, and the roles they play in the captured event. Situation recognition has recently been introduced as the task of jointly reasoning about the verbs (actions) and a set of semantic-role and entity (noun) pairs in the form of action frames. Labeling an image with an action frame requires an assignment of values (nouns) to the roles based on the observed image content. Among the inherent challenges are the rich conditional structured dependencies between the output role assignments and the overall semantic sparsity. In this paper, we propose a novel mixture-kernel attention graph neural network (GNN) architecture designed to address these challenges. Our GNN enables dynamic graph structure during training and inference, through the use of a graph attention mechanism, and context-aware interactions between role pairs. We illustrate the efficacy of our model and design choices by conducting experiments on imSitu benchmark dataset, with accuracy improvements of up to 10% over the state-of-the-art.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"19 1","pages":"10362-10371"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82598618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Learning Meshes for Dense Visual SLAM 学习密集视觉SLAM的网格
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00595
Michael Bloesch, Tristan Laidlow, R. Clark, Stefan Leutenegger, A. Davison
Estimating motion and surrounding geometry of a moving camera remains a challenging inference problem. From an information theoretic point of view, estimates should get better as more information is included, such as is done in dense SLAM, but this is strongly dependent on the validity of the underlying models. In the present paper, we use triangular meshes as both compact and dense geometry representation. To allow for simple and fast usage, we propose a view-based formulation for which we predict the in-plane vertex coordinates directly from images and then employ the remaining vertex depth components as free variables. Flexible and continuous integration of information is achieved through the use of a residual based inference technique. This so-called factor graph encodes all information as mapping from free variables to residuals, the squared sum of which is minimised during inference. We propose the use of different types of learnable residuals, which are trained end-to-end to increase their suitability as information bearing models and to enable accurate and reliable estimation. Detailed evaluation of all components is provided on both synthetic and real data which confirms the practicability of the presented approach.
估计运动摄像机的运动和周围几何形状仍然是一个具有挑战性的推理问题。从信息理论的角度来看,随着包含更多的信息,估计应该变得更好,例如在密集SLAM中完成的,但是这强烈依赖于底层模型的有效性。在本文中,我们使用三角形网格作为紧凑和密集的几何表示。为了允许简单和快速的使用,我们提出了一个基于视图的公式,我们直接从图像中预测平面内顶点坐标,然后使用剩余的顶点深度分量作为自由变量。利用残差推理技术实现了信息的灵活持续集成。这个所谓的因子图将所有信息编码为从自由变量到残差的映射,其平方和在推理过程中最小化。我们建议使用不同类型的可学习残差,对其进行端到端训练,以提高其作为信息承载模型的适用性,并实现准确可靠的估计。在综合数据和实际数据上对各组成部分进行了详细的评价,证实了所提出方法的实用性。
{"title":"Learning Meshes for Dense Visual SLAM","authors":"Michael Bloesch, Tristan Laidlow, R. Clark, Stefan Leutenegger, A. Davison","doi":"10.1109/ICCV.2019.00595","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00595","url":null,"abstract":"Estimating motion and surrounding geometry of a moving camera remains a challenging inference problem. From an information theoretic point of view, estimates should get better as more information is included, such as is done in dense SLAM, but this is strongly dependent on the validity of the underlying models. In the present paper, we use triangular meshes as both compact and dense geometry representation. To allow for simple and fast usage, we propose a view-based formulation for which we predict the in-plane vertex coordinates directly from images and then employ the remaining vertex depth components as free variables. Flexible and continuous integration of information is achieved through the use of a residual based inference technique. This so-called factor graph encodes all information as mapping from free variables to residuals, the squared sum of which is minimised during inference. We propose the use of different types of learnable residuals, which are trained end-to-end to increase their suitability as information bearing models and to enable accurate and reliable estimation. Detailed evaluation of all components is provided on both synthetic and real data which confirms the practicability of the presented approach.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"43 1","pages":"5854-5863"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82729966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
C-MIDN: Coupled Multiple Instance Detection Network With Segmentation Guidance for Weakly Supervised Object Detection 弱监督目标检测的分割制导耦合多实例检测网络
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00993
Gao Yan, B. Liu, Nan Guo, Xiaochun Ye, Fang Wan, Haihang You, Dongrui Fan
Weakly supervised object detection (WSOD) that only needs image-level annotations has obtained much attention recently. By combining convolutional neural network with multiple instance learning method, Multiple Instance Detection Network (MIDN) has become the most popular method to address the WSOD problem and been adopted as the initial model in many works. We argue that MIDN inclines to converge to the most discriminative object parts, which limits the performance of methods based on it. In this paper, we propose a novel Coupled Multiple Instance Detection Network (C-MIDN) to address this problem. Specifically, we use a pair of MIDNs, which work in a complementary manner with proposal removal. The localization information of the MIDNs is further coupled to obtain tighter bounding boxes and localize multiple objects. We also introduce a Segmentation Guided Proposal Removal (SGPR) algorithm to guarantee the MIL constraint after the removal and ensure the robustness of C-MIDN. Through a simple implementation of the C-MIDN with online detector refinement, we obtain 53.6% and 50.3% mAP on the challenging PASCAL VOC 2007 and 2012 benchmarks respectively, which significantly outperform the previous state-of-the-arts.
仅需要图像级注释的弱监督目标检测(WSOD)近年来受到了广泛关注。多实例检测网络(multiple instance Detection network, MIDN)将卷积神经网络与多实例学习方法相结合,成为解决WSOD问题最流行的方法,并在许多研究中被用作初始模型。我们认为,MIDN倾向于收敛到最具区别性的对象部分,这限制了基于它的方法的性能。在本文中,我们提出了一种新的耦合多实例检测网络(C-MIDN)来解决这个问题。具体来说,我们使用一对midn,它们与提案删除以互补的方式工作。进一步耦合midn的定位信息,得到更紧密的边界框,实现对多个对象的定位。为了保证去除后的MIL约束,保证C-MIDN的鲁棒性,我们还引入了一种分割引导提案去除(segguided Proposal Removal, SGPR)算法。通过一个简单的C-MIDN实现和在线检测器改进,我们在具有挑战性的PASCAL VOC 2007和2012基准测试中分别获得了53.6%和50.3%的mAP,显著优于之前的最先进水平。
{"title":"C-MIDN: Coupled Multiple Instance Detection Network With Segmentation Guidance for Weakly Supervised Object Detection","authors":"Gao Yan, B. Liu, Nan Guo, Xiaochun Ye, Fang Wan, Haihang You, Dongrui Fan","doi":"10.1109/ICCV.2019.00993","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00993","url":null,"abstract":"Weakly supervised object detection (WSOD) that only needs image-level annotations has obtained much attention recently. By combining convolutional neural network with multiple instance learning method, Multiple Instance Detection Network (MIDN) has become the most popular method to address the WSOD problem and been adopted as the initial model in many works. We argue that MIDN inclines to converge to the most discriminative object parts, which limits the performance of methods based on it. In this paper, we propose a novel Coupled Multiple Instance Detection Network (C-MIDN) to address this problem. Specifically, we use a pair of MIDNs, which work in a complementary manner with proposal removal. The localization information of the MIDNs is further coupled to obtain tighter bounding boxes and localize multiple objects. We also introduce a Segmentation Guided Proposal Removal (SGPR) algorithm to guarantee the MIL constraint after the removal and ensure the robustness of C-MIDN. Through a simple implementation of the C-MIDN with online detector refinement, we obtain 53.6% and 50.3% mAP on the challenging PASCAL VOC 2007 and 2012 benchmarks respectively, which significantly outperform the previous state-of-the-arts.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"35 1","pages":"9833-9842"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82717774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 96
期刊
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1