首页 > 最新文献

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
A Hierarchical Grocery Store Image Dataset With Visual and Semantic Labels 具有视觉和语义标签的分层杂货店图像数据集
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00058
Marcus Klasson, Cheng Zhang, H. Kjellström
Image classification models built into visual support systems and other assistive devices need to provide accurate predictions about their environment. We focus on an application of assistive technology for people with visual impairments, for daily activities such as shopping or cooking. In this paper, we provide a new benchmark dataset for a challenging task in this application - classification of fruits, vegetables, and refrigerated products, e.g. milk packages and juice cartons, in grocery stores. To enable the learning process to utilize multiple sources of structured information, this dataset not only contains a large volume of natural images but also includes the corresponding information of the product from an online shopping website. Such information encompasses the hierarchical structure of the object classes, as well as an iconic image of each type of object. This dataset can be used to train and evaluate image classification models for helping visually impaired people in natural environments. Additionally, we provide benchmark results evaluated on pretrained convolutional neural networks often used for image understanding purposes, and also a multi-view variational autoencoder, which is capable of utilizing the rich product information in the dataset.
视觉支持系统和其他辅助设备中内置的图像分类模型需要提供对其环境的准确预测。我们专注于为视觉障碍人士提供辅助技术的应用,用于购物或烹饪等日常活动。在本文中,我们提供了一个新的基准数据集,用于该应用中的一个具有挑战性的任务-在杂货店中对水果,蔬菜和冷藏产品(例如牛奶包装和果汁纸箱)进行分类。为了使学习过程能够利用多种来源的结构化信息,该数据集不仅包含大量的自然图像,还包含来自在线购物网站的产品的相应信息。这些信息包含对象类的层次结构,以及每种类型对象的标志性图像。该数据集可用于训练和评估图像分类模型,以帮助自然环境中的视障人士。此外,我们还提供了在通常用于图像理解目的的预训练卷积神经网络上评估的基准结果,以及能够利用数据集中丰富的产品信息的多视图变分自编码器。
{"title":"A Hierarchical Grocery Store Image Dataset With Visual and Semantic Labels","authors":"Marcus Klasson, Cheng Zhang, H. Kjellström","doi":"10.1109/WACV.2019.00058","DOIUrl":"https://doi.org/10.1109/WACV.2019.00058","url":null,"abstract":"Image classification models built into visual support systems and other assistive devices need to provide accurate predictions about their environment. We focus on an application of assistive technology for people with visual impairments, for daily activities such as shopping or cooking. In this paper, we provide a new benchmark dataset for a challenging task in this application - classification of fruits, vegetables, and refrigerated products, e.g. milk packages and juice cartons, in grocery stores. To enable the learning process to utilize multiple sources of structured information, this dataset not only contains a large volume of natural images but also includes the corresponding information of the product from an online shopping website. Such information encompasses the hierarchical structure of the object classes, as well as an iconic image of each type of object. This dataset can be used to train and evaluate image classification models for helping visually impaired people in natural environments. Additionally, we provide benchmark results evaluated on pretrained convolutional neural networks often used for image understanding purposes, and also a multi-view variational autoencoder, which is capable of utilizing the rich product information in the dataset.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127581265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
CDNet: Single Image De-Hazing Using Unpaired Adversarial Training 使用非配对对抗训练的单幅图像去雾化
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00127
Akshay Dudhane, S. Murala
Outdoor scene images generally undergo visibility degradation in presence of aerosol particles such as haze, fog and smoke. The reason behind this is, aerosol particles scatter the light rays reflected from the object surface and thus results in attenuation of light intensity. Effect of haze is inversely proportional to the transmission coefficient of the scene point. Thus, estimation of accurate transmission map (TrMap) is a key step to reconstruct the haze-free scene. Previous methods used various assumptions/priors to estimate the scene TrMap. Also, available end-to-end dehazing approaches make use of supervised training to anticipate the TrMap on synthetically generated paired hazy images. Despite the success of previous approaches, they fail in real-world extreme vague conditions due to unavailability of the real-world hazy image pairs for training the network. Thus, in this paper, Cycle-consistent generative adversarial network for single image De-hazing named as CDNet is proposed which is trained in an unpaired manner on real-world hazy image dataset. Generator network of CDNet comprises of encoder-decoder architecture which aims to estimate the object level TrMap followed by optical model to recover the haze-free scene. We conduct experiments on four datasets namely: D-HAZY [1], Imagenet [5], SOTS [20] and real-world images. Structural similarity index, peak signal to noise ratio and CIEDE2000 metric are used to evaluate the performance of the proposed CDNet. Experiments on benchmark datasets show that the proposed CDNet outperforms the existing state-of-the-art methods for single image haze removal.
在雾霾、雾和烟等气溶胶颗粒存在的情况下,室外场景图像的能见度一般会下降。这是因为气溶胶粒子将物体表面反射的光线散射,从而导致光强衰减。雾霾的效果与场景点的透射系数成反比。因此,准确透射图(TrMap)的估计是重建无雾场景的关键步骤。以前的方法使用各种假设/先验来估计场景TrMap。此外,可用的端到端去雾方法利用监督训练来预测合成生成的成对模糊图像上的TrMap。尽管之前的方法取得了成功,但由于无法获得真实世界的模糊图像对来训练网络,它们在现实世界的极端模糊条件下失败了。因此,本文提出了一种循环一致的单幅图像去雾生成对抗网络CDNet,该网络在真实模糊图像数据集上以非配对方式进行训练。CDNet的生成器网络由编码器-解码器架构组成,该架构旨在估计目标层TrMap,然后是光学模型,以恢复无雾场景。我们在D-HAZY[1]、Imagenet[5]、SOTS[20]和真实图像四个数据集上进行实验。采用结构相似度指标、峰值信噪比和CIEDE2000度量来评价所提出的CDNet的性能。在基准数据集上的实验表明,所提出的CDNet在单幅图像雾霾去除方面优于现有的最先进的方法。
{"title":"CDNet: Single Image De-Hazing Using Unpaired Adversarial Training","authors":"Akshay Dudhane, S. Murala","doi":"10.1109/WACV.2019.00127","DOIUrl":"https://doi.org/10.1109/WACV.2019.00127","url":null,"abstract":"Outdoor scene images generally undergo visibility degradation in presence of aerosol particles such as haze, fog and smoke. The reason behind this is, aerosol particles scatter the light rays reflected from the object surface and thus results in attenuation of light intensity. Effect of haze is inversely proportional to the transmission coefficient of the scene point. Thus, estimation of accurate transmission map (TrMap) is a key step to reconstruct the haze-free scene. Previous methods used various assumptions/priors to estimate the scene TrMap. Also, available end-to-end dehazing approaches make use of supervised training to anticipate the TrMap on synthetically generated paired hazy images. Despite the success of previous approaches, they fail in real-world extreme vague conditions due to unavailability of the real-world hazy image pairs for training the network. Thus, in this paper, Cycle-consistent generative adversarial network for single image De-hazing named as CDNet is proposed which is trained in an unpaired manner on real-world hazy image dataset. Generator network of CDNet comprises of encoder-decoder architecture which aims to estimate the object level TrMap followed by optical model to recover the haze-free scene. We conduct experiments on four datasets namely: D-HAZY [1], Imagenet [5], SOTS [20] and real-world images. Structural similarity index, peak signal to noise ratio and CIEDE2000 metric are used to evaluate the performance of the proposed CDNet. Experiments on benchmark datasets show that the proposed CDNet outperforms the existing state-of-the-art methods for single image haze removal.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132201006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Space-Time Event Clouds for Gesture Recognition: From RGB Cameras to Event Cameras 用于手势识别的时空事件云:从RGB相机到事件相机
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00199
Qinyi Wang, Yexin Zhang, Junsong Yuan, Yilong Lu
The recently developed event cameras can directly sense the motion in the scene by generating an asynchronous sequence of events, i.e., event streams, where each individual event (x, y, t) corresponds to the space-time location when a pixel sensor captures an intensity change. Compared with RGB cameras, event cameras are frameless but can capture much faster motion, therefore have great potential for recognizing gestures of fast motions. To deal with the unique output of event cameras, previous methods often treat event streams as time sequences, thus do not fully explore the space-time sparsity of the event stream data. In this work, we treat the event stream as a set of 3D points in space-time, i.e., space-time event clouds. To analyze event clouds and recognize gestures, we propose to leverage PointNet, a neural network architecture originally designed for matching and recognizing 3D point clouds. We further adapt PointNet to cater to event clouds for real-time gesture recognition. On the benchmark dataset of event camera based gesture recognition, i.e., IBM DVS128 Gesture dataset, our proposed method achieves a high accuracy of 97.08% and performs the best among existing methods.
最近开发的事件相机可以通过产生异步事件序列(即事件流)直接感知场景中的运动,其中每个单独的事件(x, y, t)对应于像素传感器捕获强度变化时的时空位置。与RGB相机相比,事件相机是无帧的,但可以捕捉更快的运动,因此在识别快速运动的手势方面具有很大的潜力。为了处理事件相机的独特输出,以往的方法往往将事件流视为时间序列,因此没有充分挖掘事件流数据的时空稀疏性。在这项工作中,我们将事件流视为时空中三维点的集合,即时空事件云。为了分析事件云和识别手势,我们建议利用PointNet,这是一种最初设计用于匹配和识别3D点云的神经网络架构。我们进一步调整PointNet,以满足实时手势识别的事件云。在基于事件相机的手势识别基准数据集即IBM DVS128手势数据集上,本文方法的准确率达到97.08%,是现有方法中准确率最高的。
{"title":"Space-Time Event Clouds for Gesture Recognition: From RGB Cameras to Event Cameras","authors":"Qinyi Wang, Yexin Zhang, Junsong Yuan, Yilong Lu","doi":"10.1109/WACV.2019.00199","DOIUrl":"https://doi.org/10.1109/WACV.2019.00199","url":null,"abstract":"The recently developed event cameras can directly sense the motion in the scene by generating an asynchronous sequence of events, i.e., event streams, where each individual event (x, y, t) corresponds to the space-time location when a pixel sensor captures an intensity change. Compared with RGB cameras, event cameras are frameless but can capture much faster motion, therefore have great potential for recognizing gestures of fast motions. To deal with the unique output of event cameras, previous methods often treat event streams as time sequences, thus do not fully explore the space-time sparsity of the event stream data. In this work, we treat the event stream as a set of 3D points in space-time, i.e., space-time event clouds. To analyze event clouds and recognize gestures, we propose to leverage PointNet, a neural network architecture originally designed for matching and recognizing 3D point clouds. We further adapt PointNet to cater to event clouds for real-time gesture recognition. On the benchmark dataset of event camera based gesture recognition, i.e., IBM DVS128 Gesture dataset, our proposed method achieves a high accuracy of 97.08% and performs the best among existing methods.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127987481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
Autonomous Curiosity for Real-Time Training Onboard Robotic Agents 实时训练机载机器人代理的自主好奇心
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00163
Ervin Teng, Bob Iannucci
Learning requires both study and curiosity. A good learner is not only good at extracting information from the data given to it, but also skilled at finding the right new information to learn from. This is especially true when a human operator is required to provide the ground truth—such a source should only be queried sparingly. In this work, we address the problem of curiosity as it relates to online, real-time, human-in-the-loop training of an object detection algorithm onboard a robotic platform, one where motion produces new views of the subject. We propose a deep reinforcement learning approach that decides when to ask the human user for ground truth, and when to move. Through a series of experiments, we demonstrate that our agent learns a movement and request policy that is at least 3x more effective at using human user interactions to train an object detector than untrained approaches, and is generalizable to a variety of subjects and environments.
学习既需要学习,也需要好奇心。一个好的学习者不仅善于从给定的数据中提取信息,而且善于找到正确的新信息来学习。当需要人工操作员提供真实情况时尤其如此——这种来源应该谨慎地进行查询。在这项工作中,我们解决了好奇心的问题,因为它与机器人平台上的物体检测算法的在线,实时,人在环训练有关,其中运动产生主题的新视图。我们提出了一种深度强化学习方法,该方法决定何时向人类用户询问地面真相,以及何时移动。通过一系列实验,我们证明了我们的代理学习了一种运动和请求策略,在使用人类用户交互来训练对象检测器方面,这种策略比未经训练的方法至少有效3倍,并且可以推广到各种主题和环境中。
{"title":"Autonomous Curiosity for Real-Time Training Onboard Robotic Agents","authors":"Ervin Teng, Bob Iannucci","doi":"10.1109/WACV.2019.00163","DOIUrl":"https://doi.org/10.1109/WACV.2019.00163","url":null,"abstract":"Learning requires both study and curiosity. A good learner is not only good at extracting information from the data given to it, but also skilled at finding the right new information to learn from. This is especially true when a human operator is required to provide the ground truth—such a source should only be queried sparingly. In this work, we address the problem of curiosity as it relates to online, real-time, human-in-the-loop training of an object detection algorithm onboard a robotic platform, one where motion produces new views of the subject. We propose a deep reinforcement learning approach that decides when to ask the human user for ground truth, and when to move. Through a series of experiments, we demonstrate that our agent learns a movement and request policy that is at least 3x more effective at using human user interactions to train an object detector than untrained approaches, and is generalizable to a variety of subjects and environments.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131255755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Toward Computer Vision Systems That Understand Real-World Assembly Processes 迈向计算机视觉系统,了解真实世界的组装过程
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00051
Jonathan D. Jones, Gregory Hager, S. Khudanpur
Many applications of computer vision require robust systems that can parse complex structures as they evolve in time. Using a block construction task as a case study, we illustrate the main components involved in building such systems. We evaluate performance at three increasingly-detailed levels of spatial granularity on two multimodal (RGBD + IMU) datasets. On the first, designed to match the assumptions of the model, we report better than 90% accuracy at the finest level of granularity. On the second, designed to test the robustness of our model under adverse, real-world conditions, we report 67% accuracy and 91% precision at the mid-level of granularity. We show that this seemingly simple process presents many opportunities to expand the frontiers of computer vision and action recognition.
计算机视觉的许多应用都需要强大的系统,能够解析随时间演变的复杂结构。使用块构建任务作为案例研究,我们说明了构建此类系统所涉及的主要组件。我们在两个多模态(RGBD + IMU)数据集上评估了三个越来越详细的空间粒度级别的性能。首先,为了匹配模型的假设,我们报告在最细粒度级别上的准确率优于90%。其次,为了测试我们的模型在不利的现实世界条件下的稳健性,我们报告了在中等粒度水平上67%的准确率和91%的精度。我们表明,这个看似简单的过程为扩展计算机视觉和动作识别的前沿提供了许多机会。
{"title":"Toward Computer Vision Systems That Understand Real-World Assembly Processes","authors":"Jonathan D. Jones, Gregory Hager, S. Khudanpur","doi":"10.1109/WACV.2019.00051","DOIUrl":"https://doi.org/10.1109/WACV.2019.00051","url":null,"abstract":"Many applications of computer vision require robust systems that can parse complex structures as they evolve in time. Using a block construction task as a case study, we illustrate the main components involved in building such systems. We evaluate performance at three increasingly-detailed levels of spatial granularity on two multimodal (RGBD + IMU) datasets. On the first, designed to match the assumptions of the model, we report better than 90% accuracy at the finest level of granularity. On the second, designed to test the robustness of our model under adverse, real-world conditions, we report 67% accuracy and 91% precision at the mid-level of granularity. We show that this seemingly simple process presents many opportunities to expand the frontiers of computer vision and action recognition.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132918674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Semantic Correspondence in the Wild 野外的语义对应
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00126
Akila Pemasiri, Kien Nguyen Thanh, S. Sridharan, C. Fookes
Semantic correspondence estimation where the object instances depicted are deformed extensively from one instance to the next is a challenging problem in computer vision that has received much attention. Unfortunately, all existing approaches require prior knowledge of the object classes which are present in the image environment. This is an unwanted restriction as it can prevent the establishment of semantic correspondence across object classes in wild conditions when it is uncertain which classes will be of interest. In contrast, in this paper we formulate the semantic correspondence estimation task as a key point detection process in which image-to-class classification and image-to-image correspondence are solved simultaneously. Identifying object classes within the same framework to establish correspondence, increases this approach's applicability in real world scenarios. The use of object regions in the process also enhances the accuracy while constraining the search space, thus improving overall efficiency. This new approach is compared with the state-of-the-art on publicly available datasets to validate its capability for improved semantic correspondence estimation in wild conditions.
语义对应估计是计算机视觉中一个备受关注的具有挑战性的问题,其中所描述的对象实例从一个实例到另一个实例之间存在广泛的变形。不幸的是,所有现有的方法都需要事先了解图像环境中存在的对象类。这是一个不必要的限制,因为在不确定哪些类将感兴趣的情况下,它可能会阻止在对象类之间建立语义对应。相反,本文将语义对应估计任务作为一个关键点检测过程,同时解决图像到类的分类和图像到图像的对应问题。在同一框架内识别对象类以建立对应关系,增加了这种方法在现实场景中的适用性。在此过程中对目标区域的使用也在限制搜索空间的同时提高了精度,从而提高了整体效率。将这种新方法与最新的公开可用数据集进行比较,以验证其在野外条件下改进语义对应估计的能力。
{"title":"Semantic Correspondence in the Wild","authors":"Akila Pemasiri, Kien Nguyen Thanh, S. Sridharan, C. Fookes","doi":"10.1109/WACV.2019.00126","DOIUrl":"https://doi.org/10.1109/WACV.2019.00126","url":null,"abstract":"Semantic correspondence estimation where the object instances depicted are deformed extensively from one instance to the next is a challenging problem in computer vision that has received much attention. Unfortunately, all existing approaches require prior knowledge of the object classes which are present in the image environment. This is an unwanted restriction as it can prevent the establishment of semantic correspondence across object classes in wild conditions when it is uncertain which classes will be of interest. In contrast, in this paper we formulate the semantic correspondence estimation task as a key point detection process in which image-to-class classification and image-to-image correspondence are solved simultaneously. Identifying object classes within the same framework to establish correspondence, increases this approach's applicability in real world scenarios. The use of object regions in the process also enhances the accuracy while constraining the search space, thus improving overall efficiency. This new approach is compared with the state-of-the-art on publicly available datasets to validate its capability for improved semantic correspondence estimation in wild conditions.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122637904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Attentive and Adversarial Learning for Video Summarization 视频摘要的注意和对抗性学习
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00173
Tsu-Jui Fu, Shao-Heng Tai, Hwann-Tzong Chen
This paper aims to address the video summarization problem via attention-aware and adversarial training. We formulate the problem as a sequence-to-sequence task, where the input sequence is an original video and the output sequence is its summarization. We propose a GAN-based training framework, which combines the merits of unsupervised and supervised video summarization approaches. The generator is an attention-aware Ptr-Net that generates the cutting points of summarization fragments. The discriminator is a 3D CNN classifier to judge whether a fragment is from a ground-truth or a generated summarization. The experiments show that our method achieves state-of-the-art results on SumMe, TVSum, YouTube, and LoL datasets with 1.5% to 5.6% improvements. Our Ptr-Net generator can overcome the unbalanced training-test length in the seq2seq problem, and our discriminator is effective in leveraging unpaired summarizations to achieve better performance.
本文旨在通过注意感知和对抗训练来解决视频摘要问题。我们将问题表述为序列到序列的任务,其中输入序列是原始视频,输出序列是其摘要。我们提出了一个基于gan的训练框架,它结合了无监督和有监督视频摘要方法的优点。该生成器是一个注意感知的Ptr-Net,用于生成摘要片段的切点。鉴别器是一个3D CNN分类器,用来判断一个片段是来自ground truth还是一个生成的摘要。实验表明,我们的方法在SumMe、TVSum、YouTube和LoL数据集上取得了最先进的结果,提高了1.5%到5.6%。我们的Ptr-Net生成器可以克服seq2seq问题中训练-测试长度不平衡的问题,我们的鉴别器可以有效地利用非配对摘要来获得更好的性能。
{"title":"Attentive and Adversarial Learning for Video Summarization","authors":"Tsu-Jui Fu, Shao-Heng Tai, Hwann-Tzong Chen","doi":"10.1109/WACV.2019.00173","DOIUrl":"https://doi.org/10.1109/WACV.2019.00173","url":null,"abstract":"This paper aims to address the video summarization problem via attention-aware and adversarial training. We formulate the problem as a sequence-to-sequence task, where the input sequence is an original video and the output sequence is its summarization. We propose a GAN-based training framework, which combines the merits of unsupervised and supervised video summarization approaches. The generator is an attention-aware Ptr-Net that generates the cutting points of summarization fragments. The discriminator is a 3D CNN classifier to judge whether a fragment is from a ground-truth or a generated summarization. The experiments show that our method achieves state-of-the-art results on SumMe, TVSum, YouTube, and LoL datasets with 1.5% to 5.6% improvements. Our Ptr-Net generator can overcome the unbalanced training-test length in the seq2seq problem, and our discriminator is effective in leveraging unpaired summarizations to achieve better performance.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121836318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Photo-Sketching: Inferring Contour Drawings From Images 照片素描:从图像推断轮廓图
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00154
Mengtian Li, Zhe L. Lin, R. Mech, Ersin Yumer, Deva Ramanan
Edges, boundaries and contours are important subjects of study in both computer graphics and computer vision. On one hand, they are the 2D elements that convey 3D shapes, on the other hand, they are indicative of occlusion events and thus separation of objects or semantic concepts. In this paper, we aim to generate contour drawings, boundary-like drawings that capture the outline of the visual scene. Prior art often cast this problem as boundary detection. However, the set of visual cues presented in the boundary detection output are different from the ones in contour drawings, and also the artistic style is ignored. We address these issues by collecting a new dataset of contour drawings and proposing a learning-based method that resolves diversity in the annotation and, unlike boundary detectors, can work with imperfect alignment of the annotation and the actual ground truth. Our method surpasses previous methods quantitatively and qualitatively. Surprisingly, when our model fine-tunes on BSDS500, we achieve the state-of-the-art performance in salient boundary detection, suggesting contour drawing might be a scalable alternative to boundary annotation, which at the same time is easier and more interesting for annotators to draw.
在计算机图形学和计算机视觉中,边缘、边界和轮廓都是重要的研究课题。一方面,它们是传递3D形状的2D元素,另一方面,它们指示遮挡事件,从而分离对象或语义概念。在本文中,我们的目标是生成轮廓图,捕获视觉场景轮廓的边界图。现有技术经常把这个问题作为边界检测。然而,边界检测输出中呈现的视觉线索集与轮廓图中呈现的线索集不同,并且忽略了艺术风格。我们通过收集新的轮廓图数据集并提出一种基于学习的方法来解决这些问题,该方法解决了标注中的多样性,并且与边界检测器不同,它可以在标注与实际地面事实不完全一致的情况下工作。我们的方法在数量和质量上都超越了以往的方法。令人惊讶的是,当我们的模型在BSDS500上进行微调时,我们在显著边界检测方面取得了最先进的性能,这表明轮廓绘制可能是边界标注的可扩展替代方案,同时对于标注者来说,轮廓绘制更容易、更有趣。
{"title":"Photo-Sketching: Inferring Contour Drawings From Images","authors":"Mengtian Li, Zhe L. Lin, R. Mech, Ersin Yumer, Deva Ramanan","doi":"10.1109/WACV.2019.00154","DOIUrl":"https://doi.org/10.1109/WACV.2019.00154","url":null,"abstract":"Edges, boundaries and contours are important subjects of study in both computer graphics and computer vision. On one hand, they are the 2D elements that convey 3D shapes, on the other hand, they are indicative of occlusion events and thus separation of objects or semantic concepts. In this paper, we aim to generate contour drawings, boundary-like drawings that capture the outline of the visual scene. Prior art often cast this problem as boundary detection. However, the set of visual cues presented in the boundary detection output are different from the ones in contour drawings, and also the artistic style is ignored. We address these issues by collecting a new dataset of contour drawings and proposing a learning-based method that resolves diversity in the annotation and, unlike boundary detectors, can work with imperfect alignment of the annotation and the actual ground truth. Our method surpasses previous methods quantitatively and qualitatively. Surprisingly, when our model fine-tunes on BSDS500, we achieve the state-of-the-art performance in salient boundary detection, suggesting contour drawing might be a scalable alternative to boundary annotation, which at the same time is easier and more interesting for annotators to draw.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122090693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
Ventral-Dorsal Neural Networks: Object Detection Via Selective Attention 腹背神经网络:通过选择性注意进行目标检测
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00110
M. K. Ebrahimpour, Jiayun Li, Yen-Yun Yu, Jackson Reesee, Azadeh Moghtaderi, Ming-Hsuan Yang, D. Noelle
Deep Convolutional Neural Networks (CNNs) have been repeatedly proven to perform well on image classification tasks. Object detection methods, however, are still in need of significant improvements. In this paper, we propose a new framework called Ventral-Dorsal Networks (VDNets) which is inspired by the structure of the human visual system. Roughly, the visual input signal is analyzed along two separate neural streams, one in the temporal lobe and the other in the parietal lobe. The coarse functional distinction between these streams is between object recognition — the "what" of the signal - and extracting location related information — the "where" of the signal. The ventral pathway from primary visual cortex, entering the temporal lobe, is dominated by "what" information, while the dorsal pathway, into the parietal lobe, is dominated by "where" information. Inspired by this structure, we propose the integration of a "Ventral Network" and a "Dorsal Network", which are complementary. Information about object identity can guide localization, and location information can guide attention to relevant image regions, improving object recognition. This new dual network framework sharpens the focus of object detection. Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches on PASCAL VOC 2007 by 8% (mAP) and PASCAL VOC 2012 by 3% (mAP). Moreover, a comparison of techniques on Yearbook images displays substantial qualitative and quantitative benefits of VDNet.
深度卷积神经网络(cnn)已被多次证明在图像分类任务中表现良好。然而,目标检测方法仍然需要重大的改进。在本文中,我们提出了一个新的框架,称为腹-背网络(VDNets),它的灵感来自于人类视觉系统的结构。粗略地说,视觉输入信号沿着两个独立的神经流进行分析,一个在颞叶,另一个在顶叶。这些流之间的粗略功能区别在于对象识别(信号的“内容”)和提取位置相关信息(信号的“位置”)。从初级视觉皮层进入颞叶的腹侧通路主要由“什么”信息主导,而进入顶叶的背侧通路主要由“在哪里”信息主导。受这种结构的启发,我们提出了“腹侧网络”和“背侧网络”的整合,这是互补的。物体身份信息可以指导定位,位置信息可以引导注意力到相关图像区域,提高物体识别。这种新的双网络框架使目标检测的重点更加突出。我们的实验结果表明,该方法在PASCAL VOC 2007 (mAP)和PASCAL VOC 2012 (mAP)上的性能分别比目前最先进的目标检测方法高出8%和3%。此外,年鉴图像技术的比较显示了VDNet在质量和数量上的巨大优势。
{"title":"Ventral-Dorsal Neural Networks: Object Detection Via Selective Attention","authors":"M. K. Ebrahimpour, Jiayun Li, Yen-Yun Yu, Jackson Reesee, Azadeh Moghtaderi, Ming-Hsuan Yang, D. Noelle","doi":"10.1109/WACV.2019.00110","DOIUrl":"https://doi.org/10.1109/WACV.2019.00110","url":null,"abstract":"Deep Convolutional Neural Networks (CNNs) have been repeatedly proven to perform well on image classification tasks. Object detection methods, however, are still in need of significant improvements. In this paper, we propose a new framework called Ventral-Dorsal Networks (VDNets) which is inspired by the structure of the human visual system. Roughly, the visual input signal is analyzed along two separate neural streams, one in the temporal lobe and the other in the parietal lobe. The coarse functional distinction between these streams is between object recognition — the \"what\" of the signal - and extracting location related information — the \"where\" of the signal. The ventral pathway from primary visual cortex, entering the temporal lobe, is dominated by \"what\" information, while the dorsal pathway, into the parietal lobe, is dominated by \"where\" information. Inspired by this structure, we propose the integration of a \"Ventral Network\" and a \"Dorsal Network\", which are complementary. Information about object identity can guide localization, and location information can guide attention to relevant image regions, improving object recognition. This new dual network framework sharpens the focus of object detection. Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches on PASCAL VOC 2007 by 8% (mAP) and PASCAL VOC 2012 by 3% (mAP). Moreover, a comparison of techniques on Yearbook images displays substantial qualitative and quantitative benefits of VDNet.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115023324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Segmenting Sky Pixels in Images: Analysis and Comparison 分割图像中的天空像素:分析和比较
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00189
Cecilia La Place, Aisha Urooj Khan, A. Borji
This work addresses sky segmentation, the task of determining sky and non-sky pixels in images, and improving upon existing state-of-the-art models. Outdoor scene parsing models are often trained on ideal datasets and produce high-quality results. However, this leads to inferior performance when applied to real-world images. The quality of scene parsing, particularly sky segmentation, decreases in night-time images, images involving varying weather conditions, and scene changes due to seasonal weather. We address these challenges using the RefineNet model in conjunction with two datasets: SkyFinder, and a subset of the SUN database containing sky regions (SUN-sky, henceforth). We achieve an improvement of 10-15% in the average MCR compared to prior methods using the SkyFinder dataset, and nearly 36% improvement from an off-the-shelf model in terms of average mIOU score. Employing fully connected conditional random fields as a post processing method demonstrates further enhancement of our results. Furthermore, by analyzing models over images with respect to two aspects, time of day and weather conditions, we find that when facing the same challenges as prior methods, our trained models significantly outperform them.
这项工作解决了天空分割,确定图像中天空和非天空像素的任务,并改进了现有的最先进的模型。户外场景解析模型通常在理想的数据集上进行训练,并产生高质量的结果。然而,当应用于真实世界的图像时,这会导致较差的性能。场景解析的质量,特别是天空分割,在夜间图像、涉及不同天气条件的图像以及由于季节天气而发生的场景变化中会下降。我们使用RefineNet模型结合两个数据集来解决这些挑战:SkyFinder和包含天空区域的SUN数据库的子集(SUN-sky,从今以后)。与使用SkyFinder数据集的先前方法相比,我们在平均MCR方面实现了10-15%的改进,在平均mIOU得分方面比现成模型提高了近36%。采用完全连接的条件随机场作为后处理方法进一步增强了我们的结果。此外,通过从时间和天气条件两个方面对图像上的模型进行分析,我们发现当面对与先前方法相同的挑战时,我们训练的模型明显优于它们。
{"title":"Segmenting Sky Pixels in Images: Analysis and Comparison","authors":"Cecilia La Place, Aisha Urooj Khan, A. Borji","doi":"10.1109/WACV.2019.00189","DOIUrl":"https://doi.org/10.1109/WACV.2019.00189","url":null,"abstract":"This work addresses sky segmentation, the task of determining sky and non-sky pixels in images, and improving upon existing state-of-the-art models. Outdoor scene parsing models are often trained on ideal datasets and produce high-quality results. However, this leads to inferior performance when applied to real-world images. The quality of scene parsing, particularly sky segmentation, decreases in night-time images, images involving varying weather conditions, and scene changes due to seasonal weather. We address these challenges using the RefineNet model in conjunction with two datasets: SkyFinder, and a subset of the SUN database containing sky regions (SUN-sky, henceforth). We achieve an improvement of 10-15% in the average MCR compared to prior methods using the SkyFinder dataset, and nearly 36% improvement from an off-the-shelf model in terms of average mIOU score. Employing fully connected conditional random fields as a post processing method demonstrates further enhancement of our results. Furthermore, by analyzing models over images with respect to two aspects, time of day and weather conditions, we find that when facing the same challenges as prior methods, our trained models significantly outperform them.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116212407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2019 IEEE Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1