首页 > 最新文献

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
Space-Time Event Clouds for Gesture Recognition: From RGB Cameras to Event Cameras 用于手势识别的时空事件云:从RGB相机到事件相机
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00199
Qinyi Wang, Yexin Zhang, Junsong Yuan, Yilong Lu
The recently developed event cameras can directly sense the motion in the scene by generating an asynchronous sequence of events, i.e., event streams, where each individual event (x, y, t) corresponds to the space-time location when a pixel sensor captures an intensity change. Compared with RGB cameras, event cameras are frameless but can capture much faster motion, therefore have great potential for recognizing gestures of fast motions. To deal with the unique output of event cameras, previous methods often treat event streams as time sequences, thus do not fully explore the space-time sparsity of the event stream data. In this work, we treat the event stream as a set of 3D points in space-time, i.e., space-time event clouds. To analyze event clouds and recognize gestures, we propose to leverage PointNet, a neural network architecture originally designed for matching and recognizing 3D point clouds. We further adapt PointNet to cater to event clouds for real-time gesture recognition. On the benchmark dataset of event camera based gesture recognition, i.e., IBM DVS128 Gesture dataset, our proposed method achieves a high accuracy of 97.08% and performs the best among existing methods.
最近开发的事件相机可以通过产生异步事件序列(即事件流)直接感知场景中的运动,其中每个单独的事件(x, y, t)对应于像素传感器捕获强度变化时的时空位置。与RGB相机相比,事件相机是无帧的,但可以捕捉更快的运动,因此在识别快速运动的手势方面具有很大的潜力。为了处理事件相机的独特输出,以往的方法往往将事件流视为时间序列,因此没有充分挖掘事件流数据的时空稀疏性。在这项工作中,我们将事件流视为时空中三维点的集合,即时空事件云。为了分析事件云和识别手势,我们建议利用PointNet,这是一种最初设计用于匹配和识别3D点云的神经网络架构。我们进一步调整PointNet,以满足实时手势识别的事件云。在基于事件相机的手势识别基准数据集即IBM DVS128手势数据集上,本文方法的准确率达到97.08%,是现有方法中准确率最高的。
{"title":"Space-Time Event Clouds for Gesture Recognition: From RGB Cameras to Event Cameras","authors":"Qinyi Wang, Yexin Zhang, Junsong Yuan, Yilong Lu","doi":"10.1109/WACV.2019.00199","DOIUrl":"https://doi.org/10.1109/WACV.2019.00199","url":null,"abstract":"The recently developed event cameras can directly sense the motion in the scene by generating an asynchronous sequence of events, i.e., event streams, where each individual event (x, y, t) corresponds to the space-time location when a pixel sensor captures an intensity change. Compared with RGB cameras, event cameras are frameless but can capture much faster motion, therefore have great potential for recognizing gestures of fast motions. To deal with the unique output of event cameras, previous methods often treat event streams as time sequences, thus do not fully explore the space-time sparsity of the event stream data. In this work, we treat the event stream as a set of 3D points in space-time, i.e., space-time event clouds. To analyze event clouds and recognize gestures, we propose to leverage PointNet, a neural network architecture originally designed for matching and recognizing 3D point clouds. We further adapt PointNet to cater to event clouds for real-time gesture recognition. On the benchmark dataset of event camera based gesture recognition, i.e., IBM DVS128 Gesture dataset, our proposed method achieves a high accuracy of 97.08% and performs the best among existing methods.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127987481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
Autonomous Curiosity for Real-Time Training Onboard Robotic Agents 实时训练机载机器人代理的自主好奇心
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00163
Ervin Teng, Bob Iannucci
Learning requires both study and curiosity. A good learner is not only good at extracting information from the data given to it, but also skilled at finding the right new information to learn from. This is especially true when a human operator is required to provide the ground truth—such a source should only be queried sparingly. In this work, we address the problem of curiosity as it relates to online, real-time, human-in-the-loop training of an object detection algorithm onboard a robotic platform, one where motion produces new views of the subject. We propose a deep reinforcement learning approach that decides when to ask the human user for ground truth, and when to move. Through a series of experiments, we demonstrate that our agent learns a movement and request policy that is at least 3x more effective at using human user interactions to train an object detector than untrained approaches, and is generalizable to a variety of subjects and environments.
学习既需要学习,也需要好奇心。一个好的学习者不仅善于从给定的数据中提取信息,而且善于找到正确的新信息来学习。当需要人工操作员提供真实情况时尤其如此——这种来源应该谨慎地进行查询。在这项工作中,我们解决了好奇心的问题,因为它与机器人平台上的物体检测算法的在线,实时,人在环训练有关,其中运动产生主题的新视图。我们提出了一种深度强化学习方法,该方法决定何时向人类用户询问地面真相,以及何时移动。通过一系列实验,我们证明了我们的代理学习了一种运动和请求策略,在使用人类用户交互来训练对象检测器方面,这种策略比未经训练的方法至少有效3倍,并且可以推广到各种主题和环境中。
{"title":"Autonomous Curiosity for Real-Time Training Onboard Robotic Agents","authors":"Ervin Teng, Bob Iannucci","doi":"10.1109/WACV.2019.00163","DOIUrl":"https://doi.org/10.1109/WACV.2019.00163","url":null,"abstract":"Learning requires both study and curiosity. A good learner is not only good at extracting information from the data given to it, but also skilled at finding the right new information to learn from. This is especially true when a human operator is required to provide the ground truth—such a source should only be queried sparingly. In this work, we address the problem of curiosity as it relates to online, real-time, human-in-the-loop training of an object detection algorithm onboard a robotic platform, one where motion produces new views of the subject. We propose a deep reinforcement learning approach that decides when to ask the human user for ground truth, and when to move. Through a series of experiments, we demonstrate that our agent learns a movement and request policy that is at least 3x more effective at using human user interactions to train an object detector than untrained approaches, and is generalizable to a variety of subjects and environments.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131255755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Toward Computer Vision Systems That Understand Real-World Assembly Processes 迈向计算机视觉系统,了解真实世界的组装过程
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00051
Jonathan D. Jones, Gregory Hager, S. Khudanpur
Many applications of computer vision require robust systems that can parse complex structures as they evolve in time. Using a block construction task as a case study, we illustrate the main components involved in building such systems. We evaluate performance at three increasingly-detailed levels of spatial granularity on two multimodal (RGBD + IMU) datasets. On the first, designed to match the assumptions of the model, we report better than 90% accuracy at the finest level of granularity. On the second, designed to test the robustness of our model under adverse, real-world conditions, we report 67% accuracy and 91% precision at the mid-level of granularity. We show that this seemingly simple process presents many opportunities to expand the frontiers of computer vision and action recognition.
计算机视觉的许多应用都需要强大的系统,能够解析随时间演变的复杂结构。使用块构建任务作为案例研究,我们说明了构建此类系统所涉及的主要组件。我们在两个多模态(RGBD + IMU)数据集上评估了三个越来越详细的空间粒度级别的性能。首先,为了匹配模型的假设,我们报告在最细粒度级别上的准确率优于90%。其次,为了测试我们的模型在不利的现实世界条件下的稳健性,我们报告了在中等粒度水平上67%的准确率和91%的精度。我们表明,这个看似简单的过程为扩展计算机视觉和动作识别的前沿提供了许多机会。
{"title":"Toward Computer Vision Systems That Understand Real-World Assembly Processes","authors":"Jonathan D. Jones, Gregory Hager, S. Khudanpur","doi":"10.1109/WACV.2019.00051","DOIUrl":"https://doi.org/10.1109/WACV.2019.00051","url":null,"abstract":"Many applications of computer vision require robust systems that can parse complex structures as they evolve in time. Using a block construction task as a case study, we illustrate the main components involved in building such systems. We evaluate performance at three increasingly-detailed levels of spatial granularity on two multimodal (RGBD + IMU) datasets. On the first, designed to match the assumptions of the model, we report better than 90% accuracy at the finest level of granularity. On the second, designed to test the robustness of our model under adverse, real-world conditions, we report 67% accuracy and 91% precision at the mid-level of granularity. We show that this seemingly simple process presents many opportunities to expand the frontiers of computer vision and action recognition.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132918674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
CDNet: Single Image De-Hazing Using Unpaired Adversarial Training 使用非配对对抗训练的单幅图像去雾化
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00127
Akshay Dudhane, S. Murala
Outdoor scene images generally undergo visibility degradation in presence of aerosol particles such as haze, fog and smoke. The reason behind this is, aerosol particles scatter the light rays reflected from the object surface and thus results in attenuation of light intensity. Effect of haze is inversely proportional to the transmission coefficient of the scene point. Thus, estimation of accurate transmission map (TrMap) is a key step to reconstruct the haze-free scene. Previous methods used various assumptions/priors to estimate the scene TrMap. Also, available end-to-end dehazing approaches make use of supervised training to anticipate the TrMap on synthetically generated paired hazy images. Despite the success of previous approaches, they fail in real-world extreme vague conditions due to unavailability of the real-world hazy image pairs for training the network. Thus, in this paper, Cycle-consistent generative adversarial network for single image De-hazing named as CDNet is proposed which is trained in an unpaired manner on real-world hazy image dataset. Generator network of CDNet comprises of encoder-decoder architecture which aims to estimate the object level TrMap followed by optical model to recover the haze-free scene. We conduct experiments on four datasets namely: D-HAZY [1], Imagenet [5], SOTS [20] and real-world images. Structural similarity index, peak signal to noise ratio and CIEDE2000 metric are used to evaluate the performance of the proposed CDNet. Experiments on benchmark datasets show that the proposed CDNet outperforms the existing state-of-the-art methods for single image haze removal.
在雾霾、雾和烟等气溶胶颗粒存在的情况下,室外场景图像的能见度一般会下降。这是因为气溶胶粒子将物体表面反射的光线散射,从而导致光强衰减。雾霾的效果与场景点的透射系数成反比。因此,准确透射图(TrMap)的估计是重建无雾场景的关键步骤。以前的方法使用各种假设/先验来估计场景TrMap。此外,可用的端到端去雾方法利用监督训练来预测合成生成的成对模糊图像上的TrMap。尽管之前的方法取得了成功,但由于无法获得真实世界的模糊图像对来训练网络,它们在现实世界的极端模糊条件下失败了。因此,本文提出了一种循环一致的单幅图像去雾生成对抗网络CDNet,该网络在真实模糊图像数据集上以非配对方式进行训练。CDNet的生成器网络由编码器-解码器架构组成,该架构旨在估计目标层TrMap,然后是光学模型,以恢复无雾场景。我们在D-HAZY[1]、Imagenet[5]、SOTS[20]和真实图像四个数据集上进行实验。采用结构相似度指标、峰值信噪比和CIEDE2000度量来评价所提出的CDNet的性能。在基准数据集上的实验表明,所提出的CDNet在单幅图像雾霾去除方面优于现有的最先进的方法。
{"title":"CDNet: Single Image De-Hazing Using Unpaired Adversarial Training","authors":"Akshay Dudhane, S. Murala","doi":"10.1109/WACV.2019.00127","DOIUrl":"https://doi.org/10.1109/WACV.2019.00127","url":null,"abstract":"Outdoor scene images generally undergo visibility degradation in presence of aerosol particles such as haze, fog and smoke. The reason behind this is, aerosol particles scatter the light rays reflected from the object surface and thus results in attenuation of light intensity. Effect of haze is inversely proportional to the transmission coefficient of the scene point. Thus, estimation of accurate transmission map (TrMap) is a key step to reconstruct the haze-free scene. Previous methods used various assumptions/priors to estimate the scene TrMap. Also, available end-to-end dehazing approaches make use of supervised training to anticipate the TrMap on synthetically generated paired hazy images. Despite the success of previous approaches, they fail in real-world extreme vague conditions due to unavailability of the real-world hazy image pairs for training the network. Thus, in this paper, Cycle-consistent generative adversarial network for single image De-hazing named as CDNet is proposed which is trained in an unpaired manner on real-world hazy image dataset. Generator network of CDNet comprises of encoder-decoder architecture which aims to estimate the object level TrMap followed by optical model to recover the haze-free scene. We conduct experiments on four datasets namely: D-HAZY [1], Imagenet [5], SOTS [20] and real-world images. Structural similarity index, peak signal to noise ratio and CIEDE2000 metric are used to evaluate the performance of the proposed CDNet. Experiments on benchmark datasets show that the proposed CDNet outperforms the existing state-of-the-art methods for single image haze removal.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132201006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Defocus Magnification Using Conditional Adversarial Networks 使用条件对抗网络的离焦放大
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00147
P. Sakurikar, Ishit Mehta, P J Narayanan
Defocus magnification is the process of rendering a shallow depth-of-field in an image captured using a camera with a narrow aperture. Defocus magnification is a useful tool in photography for emphasis on the subject and for highlighting background bokeh. Estimating the per-pixel blur kernel or the depth-map of the scene followed by spatially-varying re-blurring is the standard approach to defocus magnification. We propose a single-step approach that directly converts a narrow-aperture image to a wide-aperture image. We use a conditional adversarial network trained on multi-aperture images created from light-fields. We use a novel loss term based on a composite focus measure to improve generalization and show high quality defocus magnification.
离焦放大是在使用窄光圈相机拍摄的图像中渲染浅景深的过程。散焦放大在摄影中是一个很有用的工具,用于强调主体和突出背景散景。估计逐像素模糊核或场景的深度图,然后进行空间变化的再模糊是散焦放大的标准方法。我们提出了一种直接将窄光圈图像转换为大光圈图像的单步方法。我们使用了一个条件对抗网络,该网络对由光场生成的多孔径图像进行了训练。我们使用了一种新的基于复合聚焦度量的损失项来提高泛化和高质量的离焦放大。
{"title":"Defocus Magnification Using Conditional Adversarial Networks","authors":"P. Sakurikar, Ishit Mehta, P J Narayanan","doi":"10.1109/WACV.2019.00147","DOIUrl":"https://doi.org/10.1109/WACV.2019.00147","url":null,"abstract":"Defocus magnification is the process of rendering a shallow depth-of-field in an image captured using a camera with a narrow aperture. Defocus magnification is a useful tool in photography for emphasis on the subject and for highlighting background bokeh. Estimating the per-pixel blur kernel or the depth-map of the scene followed by spatially-varying re-blurring is the standard approach to defocus magnification. We propose a single-step approach that directly converts a narrow-aperture image to a wide-aperture image. We use a conditional adversarial network trained on multi-aperture images created from light-fields. We use a novel loss term based on a composite focus measure to improve generalization and show high quality defocus magnification.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117167955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Single-Shot Analysis of Refractive Shape Using Convolutional Neural Networks 基于卷积神经网络的单镜头折射形状分析
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00111
J. D. Stets, Zhengqin Li, J. Frisvad, Manmohan Chandraker
The appearance of a transparent object is determined by a combination of refraction and reflection, as governed by a complex function of its shape as well as the surrounding environment. Prior works on 3D reconstruction have largely ignored transparent objects due to this challenge, yet they occur frequently in real-world scenes. This paper presents an approach to estimate depths and normals for transparent objects using a single image acquired under a distant but otherwise arbitrary environment map. In particular, we use a deep convolutional neural network (CNN) for this task. Unlike opaque objects, it is challenging to acquire ground truth training data for refractive objects, thus, we propose to use a large-scale synthetic dataset. To accurately capture the image formation process, we use a physically-based renderer. We demonstrate that a CNN trained on our dataset learns to reconstruct shape and estimate segmentation boundaries for transparent objects using a single image, while also achieving generalization to real images at test time. In experiments, we extensively study the properties of our dataset and compare to baselines demonstrating its utility.
透明物体的外观是由折射和反射的结合决定的,由其形状和周围环境的复杂功能决定。由于这一挑战,之前的3D重建工作在很大程度上忽略了透明物体,但它们在现实场景中经常发生。本文提出了一种估算透明物体深度和法线的方法,该方法使用在远处但其他任意环境地图下获得的单个图像。特别地,我们使用深度卷积神经网络(CNN)来完成这项任务。与不透明物体不同,折射率物体的地面真值训练数据的获取具有挑战性,因此,我们建议使用大规模的合成数据集。为了准确地捕捉图像形成过程,我们使用基于物理的渲染器。我们证明,在我们的数据集上训练的CNN学会了使用单个图像重建透明物体的形状和估计分割边界,同时在测试时也实现了对真实图像的泛化。在实验中,我们广泛研究了数据集的属性,并将其与基线进行比较,以证明其实用性。
{"title":"Single-Shot Analysis of Refractive Shape Using Convolutional Neural Networks","authors":"J. D. Stets, Zhengqin Li, J. Frisvad, Manmohan Chandraker","doi":"10.1109/WACV.2019.00111","DOIUrl":"https://doi.org/10.1109/WACV.2019.00111","url":null,"abstract":"The appearance of a transparent object is determined by a combination of refraction and reflection, as governed by a complex function of its shape as well as the surrounding environment. Prior works on 3D reconstruction have largely ignored transparent objects due to this challenge, yet they occur frequently in real-world scenes. This paper presents an approach to estimate depths and normals for transparent objects using a single image acquired under a distant but otherwise arbitrary environment map. In particular, we use a deep convolutional neural network (CNN) for this task. Unlike opaque objects, it is challenging to acquire ground truth training data for refractive objects, thus, we propose to use a large-scale synthetic dataset. To accurately capture the image formation process, we use a physically-based renderer. We demonstrate that a CNN trained on our dataset learns to reconstruct shape and estimate segmentation boundaries for transparent objects using a single image, while also achieving generalization to real images at test time. In experiments, we extensively study the properties of our dataset and compare to baselines demonstrating its utility.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125143321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
[Copyright notice] (版权)
Pub Date : 2019-01-01 DOI: 10.1109/wacv.2019.00003
{"title":"[Copyright notice]","authors":"","doi":"10.1109/wacv.2019.00003","DOIUrl":"https://doi.org/10.1109/wacv.2019.00003","url":null,"abstract":"","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127831272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Segmenting Sky Pixels in Images: Analysis and Comparison 分割图像中的天空像素:分析和比较
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00189
Cecilia La Place, Aisha Urooj Khan, A. Borji
This work addresses sky segmentation, the task of determining sky and non-sky pixels in images, and improving upon existing state-of-the-art models. Outdoor scene parsing models are often trained on ideal datasets and produce high-quality results. However, this leads to inferior performance when applied to real-world images. The quality of scene parsing, particularly sky segmentation, decreases in night-time images, images involving varying weather conditions, and scene changes due to seasonal weather. We address these challenges using the RefineNet model in conjunction with two datasets: SkyFinder, and a subset of the SUN database containing sky regions (SUN-sky, henceforth). We achieve an improvement of 10-15% in the average MCR compared to prior methods using the SkyFinder dataset, and nearly 36% improvement from an off-the-shelf model in terms of average mIOU score. Employing fully connected conditional random fields as a post processing method demonstrates further enhancement of our results. Furthermore, by analyzing models over images with respect to two aspects, time of day and weather conditions, we find that when facing the same challenges as prior methods, our trained models significantly outperform them.
这项工作解决了天空分割,确定图像中天空和非天空像素的任务,并改进了现有的最先进的模型。户外场景解析模型通常在理想的数据集上进行训练,并产生高质量的结果。然而,当应用于真实世界的图像时,这会导致较差的性能。场景解析的质量,特别是天空分割,在夜间图像、涉及不同天气条件的图像以及由于季节天气而发生的场景变化中会下降。我们使用RefineNet模型结合两个数据集来解决这些挑战:SkyFinder和包含天空区域的SUN数据库的子集(SUN-sky,从今以后)。与使用SkyFinder数据集的先前方法相比,我们在平均MCR方面实现了10-15%的改进,在平均mIOU得分方面比现成模型提高了近36%。采用完全连接的条件随机场作为后处理方法进一步增强了我们的结果。此外,通过从时间和天气条件两个方面对图像上的模型进行分析,我们发现当面对与先前方法相同的挑战时,我们训练的模型明显优于它们。
{"title":"Segmenting Sky Pixels in Images: Analysis and Comparison","authors":"Cecilia La Place, Aisha Urooj Khan, A. Borji","doi":"10.1109/WACV.2019.00189","DOIUrl":"https://doi.org/10.1109/WACV.2019.00189","url":null,"abstract":"This work addresses sky segmentation, the task of determining sky and non-sky pixels in images, and improving upon existing state-of-the-art models. Outdoor scene parsing models are often trained on ideal datasets and produce high-quality results. However, this leads to inferior performance when applied to real-world images. The quality of scene parsing, particularly sky segmentation, decreases in night-time images, images involving varying weather conditions, and scene changes due to seasonal weather. We address these challenges using the RefineNet model in conjunction with two datasets: SkyFinder, and a subset of the SUN database containing sky regions (SUN-sky, henceforth). We achieve an improvement of 10-15% in the average MCR compared to prior methods using the SkyFinder dataset, and nearly 36% improvement from an off-the-shelf model in terms of average mIOU score. Employing fully connected conditional random fields as a post processing method demonstrates further enhancement of our results. Furthermore, by analyzing models over images with respect to two aspects, time of day and weather conditions, we find that when facing the same challenges as prior methods, our trained models significantly outperform them.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116212407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Attentive and Adversarial Learning for Video Summarization 视频摘要的注意和对抗性学习
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00173
Tsu-Jui Fu, Shao-Heng Tai, Hwann-Tzong Chen
This paper aims to address the video summarization problem via attention-aware and adversarial training. We formulate the problem as a sequence-to-sequence task, where the input sequence is an original video and the output sequence is its summarization. We propose a GAN-based training framework, which combines the merits of unsupervised and supervised video summarization approaches. The generator is an attention-aware Ptr-Net that generates the cutting points of summarization fragments. The discriminator is a 3D CNN classifier to judge whether a fragment is from a ground-truth or a generated summarization. The experiments show that our method achieves state-of-the-art results on SumMe, TVSum, YouTube, and LoL datasets with 1.5% to 5.6% improvements. Our Ptr-Net generator can overcome the unbalanced training-test length in the seq2seq problem, and our discriminator is effective in leveraging unpaired summarizations to achieve better performance.
本文旨在通过注意感知和对抗训练来解决视频摘要问题。我们将问题表述为序列到序列的任务,其中输入序列是原始视频,输出序列是其摘要。我们提出了一个基于gan的训练框架,它结合了无监督和有监督视频摘要方法的优点。该生成器是一个注意感知的Ptr-Net,用于生成摘要片段的切点。鉴别器是一个3D CNN分类器,用来判断一个片段是来自ground truth还是一个生成的摘要。实验表明,我们的方法在SumMe、TVSum、YouTube和LoL数据集上取得了最先进的结果,提高了1.5%到5.6%。我们的Ptr-Net生成器可以克服seq2seq问题中训练-测试长度不平衡的问题,我们的鉴别器可以有效地利用非配对摘要来获得更好的性能。
{"title":"Attentive and Adversarial Learning for Video Summarization","authors":"Tsu-Jui Fu, Shao-Heng Tai, Hwann-Tzong Chen","doi":"10.1109/WACV.2019.00173","DOIUrl":"https://doi.org/10.1109/WACV.2019.00173","url":null,"abstract":"This paper aims to address the video summarization problem via attention-aware and adversarial training. We formulate the problem as a sequence-to-sequence task, where the input sequence is an original video and the output sequence is its summarization. We propose a GAN-based training framework, which combines the merits of unsupervised and supervised video summarization approaches. The generator is an attention-aware Ptr-Net that generates the cutting points of summarization fragments. The discriminator is a 3D CNN classifier to judge whether a fragment is from a ground-truth or a generated summarization. The experiments show that our method achieves state-of-the-art results on SumMe, TVSum, YouTube, and LoL datasets with 1.5% to 5.6% improvements. Our Ptr-Net generator can overcome the unbalanced training-test length in the seq2seq problem, and our discriminator is effective in leveraging unpaired summarizations to achieve better performance.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121836318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Photo-Sketching: Inferring Contour Drawings From Images 照片素描:从图像推断轮廓图
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00154
Mengtian Li, Zhe L. Lin, R. Mech, Ersin Yumer, Deva Ramanan
Edges, boundaries and contours are important subjects of study in both computer graphics and computer vision. On one hand, they are the 2D elements that convey 3D shapes, on the other hand, they are indicative of occlusion events and thus separation of objects or semantic concepts. In this paper, we aim to generate contour drawings, boundary-like drawings that capture the outline of the visual scene. Prior art often cast this problem as boundary detection. However, the set of visual cues presented in the boundary detection output are different from the ones in contour drawings, and also the artistic style is ignored. We address these issues by collecting a new dataset of contour drawings and proposing a learning-based method that resolves diversity in the annotation and, unlike boundary detectors, can work with imperfect alignment of the annotation and the actual ground truth. Our method surpasses previous methods quantitatively and qualitatively. Surprisingly, when our model fine-tunes on BSDS500, we achieve the state-of-the-art performance in salient boundary detection, suggesting contour drawing might be a scalable alternative to boundary annotation, which at the same time is easier and more interesting for annotators to draw.
在计算机图形学和计算机视觉中,边缘、边界和轮廓都是重要的研究课题。一方面,它们是传递3D形状的2D元素,另一方面,它们指示遮挡事件,从而分离对象或语义概念。在本文中,我们的目标是生成轮廓图,捕获视觉场景轮廓的边界图。现有技术经常把这个问题作为边界检测。然而,边界检测输出中呈现的视觉线索集与轮廓图中呈现的线索集不同,并且忽略了艺术风格。我们通过收集新的轮廓图数据集并提出一种基于学习的方法来解决这些问题,该方法解决了标注中的多样性,并且与边界检测器不同,它可以在标注与实际地面事实不完全一致的情况下工作。我们的方法在数量和质量上都超越了以往的方法。令人惊讶的是,当我们的模型在BSDS500上进行微调时,我们在显著边界检测方面取得了最先进的性能,这表明轮廓绘制可能是边界标注的可扩展替代方案,同时对于标注者来说,轮廓绘制更容易、更有趣。
{"title":"Photo-Sketching: Inferring Contour Drawings From Images","authors":"Mengtian Li, Zhe L. Lin, R. Mech, Ersin Yumer, Deva Ramanan","doi":"10.1109/WACV.2019.00154","DOIUrl":"https://doi.org/10.1109/WACV.2019.00154","url":null,"abstract":"Edges, boundaries and contours are important subjects of study in both computer graphics and computer vision. On one hand, they are the 2D elements that convey 3D shapes, on the other hand, they are indicative of occlusion events and thus separation of objects or semantic concepts. In this paper, we aim to generate contour drawings, boundary-like drawings that capture the outline of the visual scene. Prior art often cast this problem as boundary detection. However, the set of visual cues presented in the boundary detection output are different from the ones in contour drawings, and also the artistic style is ignored. We address these issues by collecting a new dataset of contour drawings and proposing a learning-based method that resolves diversity in the annotation and, unlike boundary detectors, can work with imperfect alignment of the annotation and the actual ground truth. Our method surpasses previous methods quantitatively and qualitatively. Surprisingly, when our model fine-tunes on BSDS500, we achieve the state-of-the-art performance in salient boundary detection, suggesting contour drawing might be a scalable alternative to boundary annotation, which at the same time is easier and more interesting for annotators to draw.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122090693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
期刊
2019 IEEE Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1