首页 > 最新文献

2019 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

英文 中文
Skeleton-Aware 3D Human Shape Reconstruction From Point Clouds 基于点云的骨骼感知三维人体形状重建
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00553
Haiyong Jiang, Jianfei Cai, Jianmin Zheng
This work addresses the problem of 3D human shape reconstruction from point clouds. Considering that human shapes are of high dimensions and with large articulations, we adopt the state-of-the-art parametric human body model, SMPL, to reduce the dimension of learning space and generate smooth and valid reconstruction. However, SMPL parameters, especially pose parameters, are not easy to learn because of ambiguity and locality of the pose representation. Thus, we propose to incorporate skeleton awareness into the deep learning based regression of SMPL parameters for 3D human shape reconstruction. Our basic idea is to use the state-of-the-art technique PointNet++ to extract point features, and then map point features to skeleton joint features and finally to SMPL parameters for the reconstruction from point clouds. Particularly, we develop an end-to-end framework, where we propose a graph aggregation module to augment PointNet++ by extracting better point features, an attention module to better map unordered point features into ordered skeleton joint features, and a skeleton graph module to extract better joint features for SMPL parameter regression. The entire framework network is first trained in an end-to-end manner on synthesized dataset, and then online fine-tuned on unseen dataset with unsupervised loss to bridges gaps between training and testing. The experiments on multiple datasets show that our method is on par with the state-of-the-art solution.
这项工作解决了从点云重建三维人体形状的问题。考虑到人体形状的高维、大关节,我们采用最先进的参数化人体模型SMPL来降低学习空间的维数,生成平滑有效的重构。然而,由于姿态表示的模糊性和局域性,SMPL参数尤其是姿态参数不容易学习。因此,我们建议将骨骼感知纳入基于深度学习的SMPL参数回归中,用于三维人体形状重建。我们的基本思路是使用最先进的PointNet++技术提取点特征,然后将点特征映射到骨架关节特征,最后映射到SMPL参数,从点云进行重建。特别地,我们开发了一个端到端框架,其中我们提出了一个图聚合模块,通过提取更好的点特征来增强PointNet++,一个注意模块,以更好地将无序点特征映射到有序骨架连接特征,以及一个骨架图模块,以提取更好的连接特征用于SMPL参数回归。整个框架网络首先以端到端的方式在合成数据集上进行训练,然后在未见过的数据集上进行在线微调,以弥补训练和测试之间的差距。在多个数据集上的实验表明,我们的方法与最先进的解决方案相当。
{"title":"Skeleton-Aware 3D Human Shape Reconstruction From Point Clouds","authors":"Haiyong Jiang, Jianfei Cai, Jianmin Zheng","doi":"10.1109/ICCV.2019.00553","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00553","url":null,"abstract":"This work addresses the problem of 3D human shape reconstruction from point clouds. Considering that human shapes are of high dimensions and with large articulations, we adopt the state-of-the-art parametric human body model, SMPL, to reduce the dimension of learning space and generate smooth and valid reconstruction. However, SMPL parameters, especially pose parameters, are not easy to learn because of ambiguity and locality of the pose representation. Thus, we propose to incorporate skeleton awareness into the deep learning based regression of SMPL parameters for 3D human shape reconstruction. Our basic idea is to use the state-of-the-art technique PointNet++ to extract point features, and then map point features to skeleton joint features and finally to SMPL parameters for the reconstruction from point clouds. Particularly, we develop an end-to-end framework, where we propose a graph aggregation module to augment PointNet++ by extracting better point features, an attention module to better map unordered point features into ordered skeleton joint features, and a skeleton graph module to extract better joint features for SMPL parameter regression. The entire framework network is first trained in an end-to-end manner on synthesized dataset, and then online fine-tuned on unseen dataset with unsupervised loss to bridges gaps between training and testing. The experiments on multiple datasets show that our method is on par with the state-of-the-art solution.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"74 1","pages":"5430-5440"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86156986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
A Dataset of Multi-Illumination Images in the Wild 野外多光照图像数据集
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00418
Lukas Murmann, Michaël Gharbi, M. Aittala, F. Durand
Collections of images under a single, uncontrolled illumination have enabled the rapid advancement of core computer vision tasks like classification, detection, and segmentation. But even with modern learning techniques, many inverse problems involving lighting and material understanding remain too severely ill-posed to be solved with single-illumination datasets. The data simply does not contain the necessary supervisory signals. Multi-illumination datasets are notoriously hard to capture, so the data is typically collected at small scale, in controlled environments, either using multiple light sources, or robotic gantries. This leads to image collections that are not representative of the variety and complexity of real world scenes. We introduce a new multi-illumination dataset of more than 1000 real scenes, each captured in high dynamic range and high resolution, under 25 lighting conditions. We demonstrate the richness of this dataset by training state-of-the-art models for three challenging applications: single-image illumination estimation, image relighting, and mixed-illuminant white balance.
在单一的,不受控制的照明下的图像集合使得核心计算机视觉任务如分类,检测和分割的快速发展成为可能。但是,即使使用现代学习技术,许多涉及照明和材料理解的逆问题仍然过于严重不适定,无法用单照明数据集来解决。这些数据根本不包含必要的监控信号。众所周知,多照明数据集很难捕获,因此数据通常是在小规模的、受控的环境中收集的,要么使用多个光源,要么使用机器人龙门。这导致图像集合不能代表真实世界场景的多样性和复杂性。我们引入了一个新的多照明数据集,其中包含超过1000个真实场景,每个场景在25种照明条件下以高动态范围和高分辨率捕获。我们通过为三个具有挑战性的应用训练最先进的模型来展示该数据集的丰富性:单图像照明估计,图像重照明和混合光源白平衡。
{"title":"A Dataset of Multi-Illumination Images in the Wild","authors":"Lukas Murmann, Michaël Gharbi, M. Aittala, F. Durand","doi":"10.1109/ICCV.2019.00418","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00418","url":null,"abstract":"Collections of images under a single, uncontrolled illumination have enabled the rapid advancement of core computer vision tasks like classification, detection, and segmentation. But even with modern learning techniques, many inverse problems involving lighting and material understanding remain too severely ill-posed to be solved with single-illumination datasets. The data simply does not contain the necessary supervisory signals. Multi-illumination datasets are notoriously hard to capture, so the data is typically collected at small scale, in controlled environments, either using multiple light sources, or robotic gantries. This leads to image collections that are not representative of the variety and complexity of real world scenes. We introduce a new multi-illumination dataset of more than 1000 real scenes, each captured in high dynamic range and high resolution, under 25 lighting conditions. We demonstrate the richness of this dataset by training state-of-the-art models for three challenging applications: single-image illumination estimation, image relighting, and mixed-illuminant white balance.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"108 1","pages":"4079-4088"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77220678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
FW-GAN: Flow-Navigated Warping GAN for Video Virtual Try-On FW-GAN:流导航扭曲GAN视频虚拟试戴
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00125
Haoye Dong, Xiaodan Liang, Xiaohui Shen, Bowen Wu, Bing-cheng Chen, Jian Yin
Beyond current image-based virtual try-on systems that have attracted increasing attention, we move a step forward to developing a video virtual try-on system that precisely transfers clothes onto the person and generates visually realistic videos conditioned on arbitrary poses. Besides the challenges in image-based virtual try-on (e.g., clothes fidelity, image synthesis), video virtual try-on further requires spatiotemporal consistency. Directly adopting existing image-based approaches often fails to generate coherent video with natural and realistic textures. In this work, we propose Flow-navigated Warping Generative Adversarial Network (FW-GAN), a novel framework that learns to synthesize the video of virtual try-on based on a person image, the desired clothes image, and a series of target poses. FW-GAN aims to synthesize the coherent and natural video while manipulating the pose and clothes. It consists of: (i) a flow-guided fusion module that warps the past frames to assist synthesis, which is also adopted in the discriminator to help enhance the coherence and quality of the synthesized video; (ii) a warping net that is designed to warp clothes image for the refinement of clothes textures; (iii) a parsing constraint loss that alleviates the problem caused by the misalignment of segmentation maps from images with different poses and various clothes. Experiments on our newly collected dataset show that FW-GAN can synthesize high-quality video of virtual try-on and significantly outperforms other methods both qualitatively and quantitatively.
除了目前吸引越来越多关注的基于图像的虚拟试穿系统之外,我们又向前迈出了一步,开发了一种视频虚拟试穿系统,可以精确地将衣服转移到人身上,并根据任意姿势生成视觉上逼真的视频。除了基于图像的虚拟试穿(如服装保真度、图像合成)面临的挑战外,视频虚拟试穿还需要时空一致性。直接采用现有的基于图像的方法往往无法生成具有自然逼真纹理的连贯视频。在这项工作中,我们提出了Flow-navigated warp Generative Adversarial Network (FW-GAN),这是一个基于人的图像、所需的衣服图像和一系列目标姿势来学习合成虚拟试穿视频的新框架。FW-GAN的目标是在操纵姿势和服装的同时合成连贯自然的视频。它包括:(1)流引导融合模块,该模块对过去的帧进行扭曲以辅助合成,鉴别器也采用了流引导融合模块,以帮助增强合成视频的连贯性和质量;(ii)经翘曲网,其设计用于经翘曲衣服图像以改善衣服纹理;(iii)解析约束损失,缓解了不同姿态、不同衣服的图像分割图不对齐的问题。在新收集的数据集上的实验表明,FW-GAN可以合成高质量的虚拟试穿视频,并且在定性和定量上都明显优于其他方法。
{"title":"FW-GAN: Flow-Navigated Warping GAN for Video Virtual Try-On","authors":"Haoye Dong, Xiaodan Liang, Xiaohui Shen, Bowen Wu, Bing-cheng Chen, Jian Yin","doi":"10.1109/ICCV.2019.00125","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00125","url":null,"abstract":"Beyond current image-based virtual try-on systems that have attracted increasing attention, we move a step forward to developing a video virtual try-on system that precisely transfers clothes onto the person and generates visually realistic videos conditioned on arbitrary poses. Besides the challenges in image-based virtual try-on (e.g., clothes fidelity, image synthesis), video virtual try-on further requires spatiotemporal consistency. Directly adopting existing image-based approaches often fails to generate coherent video with natural and realistic textures. In this work, we propose Flow-navigated Warping Generative Adversarial Network (FW-GAN), a novel framework that learns to synthesize the video of virtual try-on based on a person image, the desired clothes image, and a series of target poses. FW-GAN aims to synthesize the coherent and natural video while manipulating the pose and clothes. It consists of: (i) a flow-guided fusion module that warps the past frames to assist synthesis, which is also adopted in the discriminator to help enhance the coherence and quality of the synthesized video; (ii) a warping net that is designed to warp clothes image for the refinement of clothes textures; (iii) a parsing constraint loss that alleviates the problem caused by the misalignment of segmentation maps from images with different poses and various clothes. Experiments on our newly collected dataset show that FW-GAN can synthesize high-quality video of virtual try-on and significantly outperforms other methods both qualitatively and quantitatively.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"1161-1170"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88902256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
Deep Single-Image Portrait Relighting 深单图像肖像重照明
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00729
Hao Zhou, Sunil Hadap, Kalyan Sunkavalli, D. Jacobs
Conventional physically-based methods for relighting portrait images need to solve an inverse rendering problem, estimating face geometry, reflectance and lighting. However, the inaccurate estimation of face components can cause strong artifacts in relighting, leading to unsatisfactory results. In this work, we apply a physically-based portrait relighting method to generate a large scale, high quality, “in the wild” portrait relighting dataset (DPR). A deep Convolutional Neural Network (CNN) is then trained using this dataset to generate a relit portrait image by using a source image and a target lighting as input. The training procedure regularizes the generated results, removing the artifacts caused by physically-based relighting methods. A GAN loss is further applied to improve the quality of the relit portrait image. Our trained network can relight portrait images with resolutions as high as 1024 × 1024. We evaluate the proposed method on the proposed DPR datset, Flickr portrait dataset and Multi-PIE dataset both qualitatively and quantitatively. Our experiments demonstrate that the proposed method achieves state-of-the-art results. Please refer to https://zhhoper.github.io/dpr.html for dataset and code.
传统的基于物理的重照明人像图像的方法需要解决一个反向渲染问题,估计人脸的几何形状、反射率和光照。然而,人脸分量的不准确估计会在重光照中产生强烈的伪影,从而导致不满意的结果。在这项工作中,我们应用基于物理的肖像重光照方法来生成大规模,高质量的“野外”肖像重光照数据集(DPR)。然后使用该数据集训练深度卷积神经网络(CNN),通过使用源图像和目标照明作为输入来生成逼真的肖像图像。训练过程对生成的结果进行正则化,去除由基于物理的重光照方法引起的伪影。在此基础上,进一步应用GAN损失来提高图像质量。我们训练的网络可以重亮分辨率高达1024 × 1024的肖像图像。我们对所提出的DPR数据集、Flickr肖像数据集和Multi-PIE数据集进行了定性和定量的评价。我们的实验表明,所提出的方法达到了最先进的结果。请参考https://zhhoper.github.io/dpr.html获取数据集和代码。
{"title":"Deep Single-Image Portrait Relighting","authors":"Hao Zhou, Sunil Hadap, Kalyan Sunkavalli, D. Jacobs","doi":"10.1109/ICCV.2019.00729","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00729","url":null,"abstract":"Conventional physically-based methods for relighting portrait images need to solve an inverse rendering problem, estimating face geometry, reflectance and lighting. However, the inaccurate estimation of face components can cause strong artifacts in relighting, leading to unsatisfactory results. In this work, we apply a physically-based portrait relighting method to generate a large scale, high quality, “in the wild” portrait relighting dataset (DPR). A deep Convolutional Neural Network (CNN) is then trained using this dataset to generate a relit portrait image by using a source image and a target lighting as input. The training procedure regularizes the generated results, removing the artifacts caused by physically-based relighting methods. A GAN loss is further applied to improve the quality of the relit portrait image. Our trained network can relight portrait images with resolutions as high as 1024 × 1024. We evaluate the proposed method on the proposed DPR datset, Flickr portrait dataset and Multi-PIE dataset both qualitatively and quantitatively. Our experiments demonstrate that the proposed method achieves state-of-the-art results. Please refer to https://zhhoper.github.io/dpr.html for dataset and code.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"4 1","pages":"7193-7201"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90566788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 154
Meta-Learning to Detect Rare Objects 检测稀有对象的元学习
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.01002
Yu-Xiong Wang, Deva Ramanan, M. Hebert
Few-shot learning, i.e., learning novel concepts from few examples, is fundamental to practical visual recognition systems. While most of existing work has focused on few-shot classification, we make a step towards few-shot object detection, a more challenging yet under-explored task. We develop a conceptually simple but powerful meta-learning based framework that simultaneously tackles few-shot classification and few-shot localization in a unified, coherent way. This framework leverages meta-level knowledge about "model parameter generation" from base classes with abundant data to facilitate the generation of a detector for novel classes. Our key insight is to disentangle the learning of category-agnostic and category-specific components in a CNN based detection model. In particular, we introduce a weight prediction meta-model that enables predicting the parameters of category-specific components from few examples. We systematically benchmark the performance of modern detectors in the small-sample size regime. Experiments in a variety of realistic scenarios, including within-domain, cross-domain, and long-tailed settings, demonstrate the effectiveness and generality of our approach under different notions of novel classes.
few -shot学习,即从少数例子中学习新概念,是实际视觉识别系统的基础。虽然大多数现有工作都集中在少镜头分类上,但我们向少镜头目标检测迈出了一步,这是一个更具挑战性但尚未被探索的任务。我们开发了一个概念简单但功能强大的基于元学习的框架,以统一连贯的方式同时处理少量分类和少量定位。该框架利用了关于“模型参数生成”的元级知识,这些知识来自具有丰富数据的基类,以促进新类检测器的生成。我们的关键见解是在基于CNN的检测模型中解开类别不可知论和类别特定组件的学习。特别地,我们引入了一个权重预测元模型,可以从几个例子中预测特定类别组件的参数。我们系统地测试了现代探测器在小样本量下的性能。在各种现实场景下的实验,包括域内、跨域和长尾设置,证明了我们的方法在不同新类概念下的有效性和普遍性。
{"title":"Meta-Learning to Detect Rare Objects","authors":"Yu-Xiong Wang, Deva Ramanan, M. Hebert","doi":"10.1109/ICCV.2019.01002","DOIUrl":"https://doi.org/10.1109/ICCV.2019.01002","url":null,"abstract":"Few-shot learning, i.e., learning novel concepts from few examples, is fundamental to practical visual recognition systems. While most of existing work has focused on few-shot classification, we make a step towards few-shot object detection, a more challenging yet under-explored task. We develop a conceptually simple but powerful meta-learning based framework that simultaneously tackles few-shot classification and few-shot localization in a unified, coherent way. This framework leverages meta-level knowledge about \"model parameter generation\" from base classes with abundant data to facilitate the generation of a detector for novel classes. Our key insight is to disentangle the learning of category-agnostic and category-specific components in a CNN based detection model. In particular, we introduce a weight prediction meta-model that enables predicting the parameters of category-specific components from few examples. We systematically benchmark the performance of modern detectors in the small-sample size regime. Experiments in a variety of realistic scenarios, including within-domain, cross-domain, and long-tailed settings, demonstrate the effectiveness and generality of our approach under different notions of novel classes.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"92 1","pages":"9924-9933"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85682020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 223
G3raphGround: Graph-Based Language Grounding G3raphGround:基于图形的语言基础
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00438
Mohit Bajaj, Lanjun Wang, L. Sigal
In this paper we present an end-to-end framework for grounding of phrases in images. In contrast to previous works, our model, which we call GraphGround, uses graphs to formulate more complex, non-sequential dependencies among proposal image regions and phrases. We capture intra-modal dependencies using a separate graph neural network for each modality (visual and lingual), and then use conditional message-passing in another graph neural network to fuse their outputs and capture cross-modal relationships. This final representation results in grounding decisions. The framework supports many-to-many matching and is able to ground single phrase to multiple image regions and vice versa. We validate our design choices through a series of ablation studies and illustrate state-of-the-art performance on Flickr30k and ReferIt Game benchmark datasets.
在本文中,我们提出了一个端到端的框架,用于图像中短语的基础。与之前的作品相比,我们的模型(我们称之为GraphGround)使用图形来制定提案图像区域和短语之间更复杂、非顺序的依赖关系。我们为每个模态(视觉和语言)使用单独的图神经网络捕获模态内依赖关系,然后在另一个图神经网络中使用条件消息传递来融合它们的输出并捕获跨模态关系。这种最终的表示导致接地决策。该框架支持多对多匹配,并能够将单个短语接地到多个图像区域,反之亦然。我们通过一系列的研究来验证我们的设计选择,并在Flickr30k和ReferIt Game基准数据集上展示了最先进的性能。
{"title":"G3raphGround: Graph-Based Language Grounding","authors":"Mohit Bajaj, Lanjun Wang, L. Sigal","doi":"10.1109/ICCV.2019.00438","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00438","url":null,"abstract":"In this paper we present an end-to-end framework for grounding of phrases in images. In contrast to previous works, our model, which we call GraphGround, uses graphs to formulate more complex, non-sequential dependencies among proposal image regions and phrases. We capture intra-modal dependencies using a separate graph neural network for each modality (visual and lingual), and then use conditional message-passing in another graph neural network to fuse their outputs and capture cross-modal relationships. This final representation results in grounding decisions. The framework supports many-to-many matching and is able to ground single phrase to multiple image regions and vice versa. We validate our design choices through a series of ablation studies and illustrate state-of-the-art performance on Flickr30k and ReferIt Game benchmark datasets.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"14 1","pages":"4280-4289"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74723379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Non-Local ConvLSTM for Video Compression Artifact Reduction 用于视频压缩伪影减少的非局部ConvLSTM
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00714
Yi Xu, Longwen Gao, Kai Tian, Shuigeng Zhou, Huyang Sun
Video compression artifact reduction aims to recover high-quality videos from low-quality compressed videos. Most existing approaches use a single neighboring frame or a pair of neighboring frames (preceding and/or following the target frame) for this task. Furthermore, as frames of high quality overall may contain low-quality patches, and high-quality patches may exist in frames of low quality overall, current methods focusing on nearby peak-quality frames (PQFs) may miss high-quality details in low-quality frames. To remedy these shortcomings, in this paper we propose a novel end-to-end deep neural network called non-local ConvLSTM (NL-ConvLSTM in short) that exploits multiple consecutive frames. An approximate non-local strategy is introduced in NL-ConvLSTM to capture global motion patterns and trace the spatiotemporal dependency in a video sequence. This approximate strategy makes the non-local module work in a fast and low space-cost way. Our method uses the preceding and following frames of the target frame to generate a residual, from which a higher quality frame is reconstructed. Experiments on two datasets show that NL-ConvLSTM outperforms the existing methods.
视频压缩伪影减少旨在从低质量压缩视频中恢复高质量视频。大多数现有方法使用单个相邻帧或一对相邻帧(在目标帧之前和/或之后)来完成此任务。此外,由于整体质量高的帧中可能包含低质量的补丁,而整体质量低的帧中可能存在高质量的补丁,因此目前关注附近峰值质量帧(pqf)的方法可能会在低质量帧中错过高质量的细节。为了弥补这些缺点,在本文中,我们提出了一种新的端到端深度神经网络,称为非局部ConvLSTM(简称NL-ConvLSTM),它利用多个连续帧。在NL-ConvLSTM中引入了一种近似的非局部策略来捕获全局运动模式并跟踪视频序列中的时空依赖性。这种近似策略使得非局部模块以快速和低空间成本的方式工作。我们的方法是利用目标帧的前帧和后帧来产生残差,从而重构出质量更高的帧。在两个数据集上的实验表明,NL-ConvLSTM优于现有的方法。
{"title":"Non-Local ConvLSTM for Video Compression Artifact Reduction","authors":"Yi Xu, Longwen Gao, Kai Tian, Shuigeng Zhou, Huyang Sun","doi":"10.1109/ICCV.2019.00714","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00714","url":null,"abstract":"Video compression artifact reduction aims to recover high-quality videos from low-quality compressed videos. Most existing approaches use a single neighboring frame or a pair of neighboring frames (preceding and/or following the target frame) for this task. Furthermore, as frames of high quality overall may contain low-quality patches, and high-quality patches may exist in frames of low quality overall, current methods focusing on nearby peak-quality frames (PQFs) may miss high-quality details in low-quality frames. To remedy these shortcomings, in this paper we propose a novel end-to-end deep neural network called non-local ConvLSTM (NL-ConvLSTM in short) that exploits multiple consecutive frames. An approximate non-local strategy is introduced in NL-ConvLSTM to capture global motion patterns and trace the spatiotemporal dependency in a video sequence. This approximate strategy makes the non-local module work in a fast and low space-cost way. Our method uses the preceding and following frames of the target frame to generate a residual, from which a higher quality frame is reconstructed. Experiments on two datasets show that NL-ConvLSTM outperforms the existing methods.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"8 1","pages":"7042-7051"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74632417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding mmmact:跨模态人类行为理解的大规模数据集
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00875
Quan Kong, Ziming Wu, Ziwei Deng, Martin Klinkigt, Bin Tong, Tomokazu Murakami
Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and appearance variation. However, a standard large-scale dataset does not exist, in which different types of modalities across vision and sensors are integrated. To address the disadvantage of vision-based modalities and push towards multi/cross modal action understanding, this paper introduces a new large-scale dataset recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal. The dataset consists of more than 36k video clips for 37 action classes covering a wide range of daily life activities such as desktop-related and check-in-based ones in four different distinct scenarios. On the basis of our dataset, we propose a novel multi modality distillation model with attention mechanism to realize an adaptive knowledge transfer from sensor-based modalities to vision-based modalities. The proposed model significantly improves performance of action recognition compared to models trained with only RGB information. The experimental results confirm the effectiveness of our model on cross-subject, -view, -scene and -session evaluation criteria. We believe that this new large-scale multimodal dataset will contribute the community of multimodal based action understanding.
与视觉模式不同,身体佩戴传感器或被动传感可以避免在视觉相关挑战中动作理解的失败,例如遮挡和外观变化。然而,一个标准的大规模数据集不存在,其中不同类型的模式跨视觉和传感器集成。为了解决基于视觉模态的缺点,推动多/跨模态动作理解,本文介绍了一个新的大规模数据集,该数据集来自20个不同的受试者,具有7种不同的模态:RGB视频、关键点、加速度、陀螺仪、方向、Wi-Fi和压力信号。该数据集由37个动作类的超过36k个视频剪辑组成,涵盖了四种不同场景下的广泛的日常生活活动,如桌面相关和基于签到的活动。在此基础上,我们提出了一种新的多模态蒸馏模型,该模型具有注意机制,实现了从基于传感器的模态到基于视觉的模态的自适应知识转移。与仅使用RGB信息训练的模型相比,该模型显著提高了动作识别的性能。实验结果证实了该模型在跨主题、多视角、多场景和多会话评价标准上的有效性。我们相信这个新的大规模多模态数据集将为基于多模态的动作理解社区做出贡献。
{"title":"MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding","authors":"Quan Kong, Ziming Wu, Ziwei Deng, Martin Klinkigt, Bin Tong, Tomokazu Murakami","doi":"10.1109/ICCV.2019.00875","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00875","url":null,"abstract":"Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and appearance variation. However, a standard large-scale dataset does not exist, in which different types of modalities across vision and sensors are integrated. To address the disadvantage of vision-based modalities and push towards multi/cross modal action understanding, this paper introduces a new large-scale dataset recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal. The dataset consists of more than 36k video clips for 37 action classes covering a wide range of daily life activities such as desktop-related and check-in-based ones in four different distinct scenarios. On the basis of our dataset, we propose a novel multi modality distillation model with attention mechanism to realize an adaptive knowledge transfer from sensor-based modalities to vision-based modalities. The proposed model significantly improves performance of action recognition compared to models trained with only RGB information. The experimental results confirm the effectiveness of our model on cross-subject, -view, -scene and -session evaluation criteria. We believe that this new large-scale multimodal dataset will contribute the community of multimodal based action understanding.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"209 1","pages":"8657-8666"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74899815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Order-Preserving Wasserstein Discriminant Analysis 保序Wasserstein判别分析
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00998
Bing Su, Jiahuan Zhou, Ying Wu
Supervised dimensionality reduction for sequence data projects the observations in sequences onto a low-dimensional subspace to better separate different sequence classes. It is typically more challenging than conventional dimensionality reduction for static data, because measuring the separability of sequences involves non-linear procedures to manipulate the temporal structures. This paper presents a linear method, namely Order-preserving Wasserstein Discriminant Analysis (OWDA), which learns the projection by maximizing the inter-class distance and minimizing the intra-class scatter. For each class, OWDA extracts the order-preserving Wasserstein barycenter and constructs the intra-class scatter as the dispersion of the training sequences around the barycenter. The inter-class distance is measured as the order-preserving Wasserstein distance between the corresponding barycenters. OWDA is able to concentrate on the distinctive differences among classes by lifting the geometric relations with temporal constraints. Experiments show that OWDA achieves competitive results on three 3D action recognition datasets.
序列数据的监督降维将序列中的观测值投影到低维子空间上,以更好地分离不同的序列类。它通常比传统的静态数据降维更具挑战性,因为测量序列的可分性涉及非线性过程来操纵时间结构。本文提出了一种线性方法,即保序Wasserstein判别分析(OWDA),该方法通过最大化类间距离和最小化类内散点来学习投影。对于每个类,OWDA提取保序Wasserstein质心,并构造类内散点作为训练序列在质心周围的离散度。类间距离用相应质心之间的保序瓦瑟斯坦距离来测量。通过提升具有时间约束的几何关系,OWDA能够专注于类之间的显著差异。实验表明,该方法在三种三维动作识别数据集上取得了较好的识别效果。
{"title":"Order-Preserving Wasserstein Discriminant Analysis","authors":"Bing Su, Jiahuan Zhou, Ying Wu","doi":"10.1109/ICCV.2019.00998","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00998","url":null,"abstract":"Supervised dimensionality reduction for sequence data projects the observations in sequences onto a low-dimensional subspace to better separate different sequence classes. It is typically more challenging than conventional dimensionality reduction for static data, because measuring the separability of sequences involves non-linear procedures to manipulate the temporal structures. This paper presents a linear method, namely Order-preserving Wasserstein Discriminant Analysis (OWDA), which learns the projection by maximizing the inter-class distance and minimizing the intra-class scatter. For each class, OWDA extracts the order-preserving Wasserstein barycenter and constructs the intra-class scatter as the dispersion of the training sequences around the barycenter. The inter-class distance is measured as the order-preserving Wasserstein distance between the corresponding barycenters. OWDA is able to concentrate on the distinctive differences among classes by lifting the geometric relations with temporal constraints. Experiments show that OWDA achieves competitive results on three 3D action recognition datasets.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"39 1","pages":"9884-9893"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75568481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Better and Faster: Exponential Loss for Image Patch Matching 更好更快:图像补丁匹配的指数损失
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00491
Shuang Wang, Yanfeng Li, Xuefeng Liang, Dou Quan, Bowu Yang, Shaowei Wei, L. Jiao
Recent studies on image patch matching are paying more attention on hard sample learning, because easy samples do not contribute much to the network optimization. They have proposed various hard negative sample mining strategies, but very few addressed this problem from the perspective of loss functions. Our research shows that the conventional Siamese and triplet losses treat all samples linearly, thus make the training time consuming. Instead, we propose the exponential Siamese and triplet losses, which can naturally focus more on hard samples and put less emphasis on easy ones, meanwhile, speed up the optimization. To assist the exponential losses, we introduce the hard positive sample mining to further enhance the effectiveness. The extensive experiments demonstrate our proposal improves both metric and descriptor learning on several well accepted benchmarks, and outperforms the state-of-the-arts on the UBC dataset. Moreover, it also shows a better generalizability on cross-spectral image matching and image retrieval tasks.
由于简单的样本对网络优化的贡献不大,目前对图像补丁匹配的研究更多地关注于难样本的学习。他们提出了各种硬负样本挖掘策略,但很少从损失函数的角度来解决这个问题。我们的研究表明,传统的Siamese和triplet损失对所有样本进行线性处理,从而使训练耗时。相反,我们提出了指数Siamese和triplet损失,自然可以更多地关注硬样本,而不太重视简单样本,同时加快了优化速度。为了弥补指数损失,我们引入了硬正样本挖掘来进一步提高效率。广泛的实验表明,我们的建议在几个公认的基准上改进了度量和描述符学习,并且在UBC数据集上优于最先进的技术。此外,它在跨光谱图像匹配和图像检索任务上也显示出较好的泛化能力。
{"title":"Better and Faster: Exponential Loss for Image Patch Matching","authors":"Shuang Wang, Yanfeng Li, Xuefeng Liang, Dou Quan, Bowu Yang, Shaowei Wei, L. Jiao","doi":"10.1109/ICCV.2019.00491","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00491","url":null,"abstract":"Recent studies on image patch matching are paying more attention on hard sample learning, because easy samples do not contribute much to the network optimization. They have proposed various hard negative sample mining strategies, but very few addressed this problem from the perspective of loss functions. Our research shows that the conventional Siamese and triplet losses treat all samples linearly, thus make the training time consuming. Instead, we propose the exponential Siamese and triplet losses, which can naturally focus more on hard samples and put less emphasis on easy ones, meanwhile, speed up the optimization. To assist the exponential losses, we introduce the hard positive sample mining to further enhance the effectiveness. The extensive experiments demonstrate our proposal improves both metric and descriptor learning on several well accepted benchmarks, and outperforms the state-of-the-arts on the UBC dataset. Moreover, it also shows a better generalizability on cross-spectral image matching and image retrieval tasks.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"15 20 1","pages":"4811-4820"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77319011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
期刊
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1