首页 > 最新文献

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
[Copyright notice] (版权)
Pub Date : 2019-01-01 DOI: 10.1109/wacv.2019.00003
{"title":"[Copyright notice]","authors":"","doi":"10.1109/wacv.2019.00003","DOIUrl":"https://doi.org/10.1109/wacv.2019.00003","url":null,"abstract":"","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127831272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-Shot Analysis of Refractive Shape Using Convolutional Neural Networks 基于卷积神经网络的单镜头折射形状分析
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00111
J. D. Stets, Zhengqin Li, J. Frisvad, Manmohan Chandraker
The appearance of a transparent object is determined by a combination of refraction and reflection, as governed by a complex function of its shape as well as the surrounding environment. Prior works on 3D reconstruction have largely ignored transparent objects due to this challenge, yet they occur frequently in real-world scenes. This paper presents an approach to estimate depths and normals for transparent objects using a single image acquired under a distant but otherwise arbitrary environment map. In particular, we use a deep convolutional neural network (CNN) for this task. Unlike opaque objects, it is challenging to acquire ground truth training data for refractive objects, thus, we propose to use a large-scale synthetic dataset. To accurately capture the image formation process, we use a physically-based renderer. We demonstrate that a CNN trained on our dataset learns to reconstruct shape and estimate segmentation boundaries for transparent objects using a single image, while also achieving generalization to real images at test time. In experiments, we extensively study the properties of our dataset and compare to baselines demonstrating its utility.
透明物体的外观是由折射和反射的结合决定的,由其形状和周围环境的复杂功能决定。由于这一挑战,之前的3D重建工作在很大程度上忽略了透明物体,但它们在现实场景中经常发生。本文提出了一种估算透明物体深度和法线的方法,该方法使用在远处但其他任意环境地图下获得的单个图像。特别地,我们使用深度卷积神经网络(CNN)来完成这项任务。与不透明物体不同,折射率物体的地面真值训练数据的获取具有挑战性,因此,我们建议使用大规模的合成数据集。为了准确地捕捉图像形成过程,我们使用基于物理的渲染器。我们证明,在我们的数据集上训练的CNN学会了使用单个图像重建透明物体的形状和估计分割边界,同时在测试时也实现了对真实图像的泛化。在实验中,我们广泛研究了数据集的属性,并将其与基线进行比较,以证明其实用性。
{"title":"Single-Shot Analysis of Refractive Shape Using Convolutional Neural Networks","authors":"J. D. Stets, Zhengqin Li, J. Frisvad, Manmohan Chandraker","doi":"10.1109/WACV.2019.00111","DOIUrl":"https://doi.org/10.1109/WACV.2019.00111","url":null,"abstract":"The appearance of a transparent object is determined by a combination of refraction and reflection, as governed by a complex function of its shape as well as the surrounding environment. Prior works on 3D reconstruction have largely ignored transparent objects due to this challenge, yet they occur frequently in real-world scenes. This paper presents an approach to estimate depths and normals for transparent objects using a single image acquired under a distant but otherwise arbitrary environment map. In particular, we use a deep convolutional neural network (CNN) for this task. Unlike opaque objects, it is challenging to acquire ground truth training data for refractive objects, thus, we propose to use a large-scale synthetic dataset. To accurately capture the image formation process, we use a physically-based renderer. We demonstrate that a CNN trained on our dataset learns to reconstruct shape and estimate segmentation boundaries for transparent objects using a single image, while also achieving generalization to real images at test time. In experiments, we extensively study the properties of our dataset and compare to baselines demonstrating its utility.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125143321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Local Color Mapping Combined with Color Transfer for Underwater Image Enhancement 局部色彩映射与色彩转移相结合用于水下图像增强
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00157
R. Protasiuk, Adel Bibi, Bernard Ghanem
Color correction and color transfer methods have gained a lot of attention in the past few years to circumvent color degradation that may occur due to various sources. In this paper, we propose a novel simple yet powerful strategy to profoundly enhance color distorted underwater images. The proposed approach combines both local and global information through a simple yet powerful affine transform model. Local and global information are carried through local color mapping and color covariance mapping between an input and some reference source, respectively. Several experiments on degraded underwater images demonstrate that the proposed method performs favourably to all other methods including ones that are tailored to correcting underwater images by explicit noise modelling.
色彩校正和色彩转移方法在过去几年中获得了很多关注,以避免由于各种来源可能发生的色彩退化。在本文中,我们提出了一种新颖的简单而强大的策略来深度增强彩色失真的水下图像。该方法通过一个简单而强大的仿射变换模型结合了局部和全局信息。局部和全局信息分别通过输入和参考源之间的局部颜色映射和颜色协方差映射来传递。在退化的水下图像上进行的一些实验表明,该方法优于所有其他方法,包括那些通过显式噪声建模来校正水下图像的方法。
{"title":"Local Color Mapping Combined with Color Transfer for Underwater Image Enhancement","authors":"R. Protasiuk, Adel Bibi, Bernard Ghanem","doi":"10.1109/WACV.2019.00157","DOIUrl":"https://doi.org/10.1109/WACV.2019.00157","url":null,"abstract":"Color correction and color transfer methods have gained a lot of attention in the past few years to circumvent color degradation that may occur due to various sources. In this paper, we propose a novel simple yet powerful strategy to profoundly enhance color distorted underwater images. The proposed approach combines both local and global information through a simple yet powerful affine transform model. Local and global information are carried through local color mapping and color covariance mapping between an input and some reference source, respectively. Several experiments on degraded underwater images demonstrate that the proposed method performs favourably to all other methods including ones that are tailored to correcting underwater images by explicit noise modelling.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134480154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
HiBsteR: Hierarchical Boosted Deep Metric Learning for Image Retrieval HiBsteR:用于图像检索的分层增强深度度量学习
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00069
Georg Waltner, M. Opitz, Horst Possegger, H. Bischof
When the number of categories is growing into thousands, large-scale image retrieval becomes an increasingly hard task. Retrieval accuracy can be improved by learning distance metric methods that separate categories in a transformed embedding space. Unlike most methods that utilize a single embedding to learn a distance metric, we build on the idea of boosted metric learning, where an embedding is split into a boosted ensemble of embeddings. While in general metric learning is directly applied on fine labels to learn embeddings, we take this one step further and incorporate hierarchical label information into the boosting framework and show how to properly adapt loss functions for this purpose. We show that by introducing several sub-embeddings which focus on specific hierarchical classes, the retrieval accuracy can be improved compared to standard flat label embeddings. The proposed method is especially suitable for exploiting hierarchical datasets or when additional labels can be retrieved without much effort. Our approach improves R@1 over state-of-the-art methods on the biggest available retrieval dataset (Stanford Online Products) and sets new reference baselines for hierarchical metric learning on several other datasets (CUB-200-2011, VegFru, FruitVeg-81). We show that the clustering quality in terms of NMI score is superior to previous works.
当分类数量增加到数千个时,大规模图像检索成为一项越来越困难的任务。通过学习在转换后的嵌入空间中分离类别的距离度量方法,可以提高检索精度。与大多数利用单个嵌入来学习距离度量的方法不同,我们建立在增强度量学习的思想之上,其中嵌入被分割成一个增强的嵌入集合。虽然通常度量学习直接应用于精细标签来学习嵌入,但我们更进一步,将分层标签信息合并到增强框架中,并展示了如何为此目的适当地调整损失函数。研究表明,与标准平面标签嵌入相比,通过引入几个专注于特定层次类的子嵌入可以提高检索精度。所提出的方法特别适用于利用分层数据集或当额外的标签可以不费力地检索时。我们的方法在最大的可用检索数据集(斯坦福在线产品)上改进了R@1,并为其他几个数据集(CUB-200-2011, VegFru, FruitVeg-81)上的分层度量学习设置了新的参考基线。我们表明,在NMI得分方面的聚类质量优于以往的工作。
{"title":"HiBsteR: Hierarchical Boosted Deep Metric Learning for Image Retrieval","authors":"Georg Waltner, M. Opitz, Horst Possegger, H. Bischof","doi":"10.1109/WACV.2019.00069","DOIUrl":"https://doi.org/10.1109/WACV.2019.00069","url":null,"abstract":"When the number of categories is growing into thousands, large-scale image retrieval becomes an increasingly hard task. Retrieval accuracy can be improved by learning distance metric methods that separate categories in a transformed embedding space. Unlike most methods that utilize a single embedding to learn a distance metric, we build on the idea of boosted metric learning, where an embedding is split into a boosted ensemble of embeddings. While in general metric learning is directly applied on fine labels to learn embeddings, we take this one step further and incorporate hierarchical label information into the boosting framework and show how to properly adapt loss functions for this purpose. We show that by introducing several sub-embeddings which focus on specific hierarchical classes, the retrieval accuracy can be improved compared to standard flat label embeddings. The proposed method is especially suitable for exploiting hierarchical datasets or when additional labels can be retrieved without much effort. Our approach improves R@1 over state-of-the-art methods on the biggest available retrieval dataset (Stanford Online Products) and sets new reference baselines for hierarchical metric learning on several other datasets (CUB-200-2011, VegFru, FruitVeg-81). We show that the clustering quality in terms of NMI score is superior to previous works.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124992277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
CNN-Based Semantic Segmentation Using Level Set Loss 基于cnn的水平集损失语义分割
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00191
Youngeun Kim, Seunghyeon Kim, Taekyung Kim, Changick Kim
Thesedays, Convolutional Neural Networks are widely used in semantic segmentation. However, since CNN-based segmentation networks produce low-resolution outputs with rich semantic information, it is inevitable that spatial details (e.g., small objects and fine boundary information) of segmentation results will be lost. To address this problem, motivated by a variational approach to image segmentation (i.e., level set theory), we propose a novel loss function called the level set loss which is designed to refine spatial details of segmentation results. To deal with multiple classes in an image, we first decompose the ground truth into binary images. Note that each binary image consists of background and regions belonging to a class. Then we convert level set functions into class probability maps and calculate the energy for each class. The network is trained to minimize the weighted sum of the level set loss and the cross-entropy loss. The proposed level set loss improves the spatial details of segmentation results in a time and memory efficient way. Furthermore, our experimental results show that the proposed loss function achieves better performance than previous approaches.
目前,卷积神经网络在语义分割中得到了广泛的应用。然而,由于基于cnn的分割网络产生的是语义信息丰富的低分辨率输出,分割结果的空间细节(如小物体和精细边界信息)不可避免地会丢失。为了解决这个问题,在变分图像分割方法(即水平集理论)的激励下,我们提出了一种新的损失函数,称为水平集损失,旨在细化分割结果的空间细节。为了处理图像中的多个类,我们首先将基本真值分解为二值图像。请注意,每个二值图像由属于一个类的背景和区域组成。然后将水平集函数转换为类概率图,计算每个类的能量。该网络被训练成最小化水平集损失和交叉熵损失的加权和。所提出的水平集损失以一种节省时间和内存的方式改善了分割结果的空间细节。此外,我们的实验结果表明,所提出的损失函数比以前的方法取得了更好的性能。
{"title":"CNN-Based Semantic Segmentation Using Level Set Loss","authors":"Youngeun Kim, Seunghyeon Kim, Taekyung Kim, Changick Kim","doi":"10.1109/WACV.2019.00191","DOIUrl":"https://doi.org/10.1109/WACV.2019.00191","url":null,"abstract":"Thesedays, Convolutional Neural Networks are widely used in semantic segmentation. However, since CNN-based segmentation networks produce low-resolution outputs with rich semantic information, it is inevitable that spatial details (e.g., small objects and fine boundary information) of segmentation results will be lost. To address this problem, motivated by a variational approach to image segmentation (i.e., level set theory), we propose a novel loss function called the level set loss which is designed to refine spatial details of segmentation results. To deal with multiple classes in an image, we first decompose the ground truth into binary images. Note that each binary image consists of background and regions belonging to a class. Then we convert level set functions into class probability maps and calculate the energy for each class. The network is trained to minimize the weighted sum of the level set loss and the cross-entropy loss. The proposed level set loss improves the spatial details of segmentation results in a time and memory efficient way. Furthermore, our experimental results show that the proposed loss function achieves better performance than previous approaches.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130354915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Automatic Detection and Segmentation of Lentil Crop Breeding Plots From Multi-Spectral Images Captured by UAV-Mounted Camera 基于无人机多光谱图像的扁豆作物育种小区自动检测与分割
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00183
Imran Ahmed, M. Eramian, I. Ovsyannikov, William van der Kamp, K. Nielsen, H. Duddu, Arafia Rumali, S. Shirtliffe, K. Bett
Unmanned Aerial Vehicles (UAVs) paired with image detection and segmentation techniques can be used to extract plant phenotype information of individual breeding or research plots. Each plot contains plants of a single genetic line. Breeders are interested in selecting lines with preferred phenotypes (physical traits) that increase crop yield or resilience. Automated detection and segmentation of plots would enable automatic monitoring and quantification of plot phenotypes, allowing a faster selection process that requires much fewer person-hours compared with manual assessment. A detection algorithm based on Laplacian of Gaussian (LoG) blob detection and a segmentation algorithm based on a combination of unsupervised clustering and random walker image segmentation are proposed to detect and segment lentil plots from multi-spectral aerial images. Our algorithm detects and segments lentil plots from normalized difference vegetative index (NDVI) images. The detection algorithm exhibited an average precision and recall of 96.3% and 97.2% respectively. The average Dice similarity coefficient between a detected segmented plot and its ground truth was 0.906.
无人机与图像检测和分割技术相结合,可用于提取单个育种或研究地块的植物表型信息。每个地块包含单一遗传系的植物。育种者有兴趣选择具有提高作物产量或抗逆性的首选表型(物理性状)的品系。地块的自动检测和分割将实现地块表型的自动监测和定量,与人工评估相比,允许更快的选择过程所需的工时大大减少。提出了一种基于拉普拉斯高斯(LoG)斑点检测的检测算法和一种基于无监督聚类和随机行走图像分割相结合的分割算法,对多光谱航空图像中的扁豆地块进行检测和分割。该算法从归一化差异植被指数(NDVI)图像中检测和分割扁豆图。检测算法的平均准确率和召回率分别为96.3%和97.2%。检测到的分割图与其ground truth之间的平均Dice相似系数为0.906。
{"title":"Automatic Detection and Segmentation of Lentil Crop Breeding Plots From Multi-Spectral Images Captured by UAV-Mounted Camera","authors":"Imran Ahmed, M. Eramian, I. Ovsyannikov, William van der Kamp, K. Nielsen, H. Duddu, Arafia Rumali, S. Shirtliffe, K. Bett","doi":"10.1109/WACV.2019.00183","DOIUrl":"https://doi.org/10.1109/WACV.2019.00183","url":null,"abstract":"Unmanned Aerial Vehicles (UAVs) paired with image detection and segmentation techniques can be used to extract plant phenotype information of individual breeding or research plots. Each plot contains plants of a single genetic line. Breeders are interested in selecting lines with preferred phenotypes (physical traits) that increase crop yield or resilience. Automated detection and segmentation of plots would enable automatic monitoring and quantification of plot phenotypes, allowing a faster selection process that requires much fewer person-hours compared with manual assessment. A detection algorithm based on Laplacian of Gaussian (LoG) blob detection and a segmentation algorithm based on a combination of unsupervised clustering and random walker image segmentation are proposed to detect and segment lentil plots from multi-spectral aerial images. Our algorithm detects and segments lentil plots from normalized difference vegetative index (NDVI) images. The detection algorithm exhibited an average precision and recall of 96.3% and 97.2% respectively. The average Dice similarity coefficient between a detected segmented plot and its ground truth was 0.906.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121515103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Zero Shot License Plate Re-Identification 零射击车牌重新识别
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00087
Mayank Gupta, Abhinav Kumar, S. Madhvanath
The problem of person, vehicle or license plate reidentification is generally treated as a multi-shot image retrieval problem. The objective of these tasks is to learn a feature representation of query images (called a "signature") and then use these signatures to match against a database of template image signatures with the aid of a distance metric. In this paper, we propose a novel approach for license plate Re-Id inspired by Zero Shot Learning. The core idea is to generate template signatures for retrieval purposes from a multi-hot text encoding of license plates instead of their images. The proposed method maps license plate images and their license plate numbers to a common embedding space using a Symmetric Triplet loss function so that an image can be queried against its text. In effect, our approach makes it possible to identify license plates whose images have never been seen before, using a large text database of license plate numbers. We show that our system is capable of highly accurate and fast re-identification of license plates, and its performance compares favorably to both OCR-based approaches as well as state of the art image-based Re-ID approaches. In addition to the advantages of avoiding manual image labeling and the ease of creating signature databases, the minimal time and storage requirements enable our system to be deployed even on portable devices.
人、车辆或车牌的再识别问题通常被视为一个多镜头图像检索问题。这些任务的目标是学习查询图像的特征表示(称为“签名”),然后使用这些签名在距离度量的帮助下与模板图像签名数据库进行匹配。本文提出了一种受Zero Shot Learning启发的车牌Re-Id新方法。其核心思想是从车牌的多热文本编码而不是其图像中生成用于检索目的的模板签名。该方法使用对称三重损失函数将车牌图像及其车牌号码映射到公共嵌入空间,从而可以根据图像的文本查询图像。实际上,我们的方法可以使用大型车牌号码文本数据库来识别以前从未见过的车牌图像。我们表明,我们的系统能够高度准确和快速地重新识别车牌,其性能优于基于ocr的方法以及最先进的基于图像的重新识别方法。除了避免手动图像标记和易于创建特征数据库的优点外,最小的时间和存储要求使我们的系统甚至可以部署在便携式设备上。
{"title":"Zero Shot License Plate Re-Identification","authors":"Mayank Gupta, Abhinav Kumar, S. Madhvanath","doi":"10.1109/WACV.2019.00087","DOIUrl":"https://doi.org/10.1109/WACV.2019.00087","url":null,"abstract":"The problem of person, vehicle or license plate reidentification is generally treated as a multi-shot image retrieval problem. The objective of these tasks is to learn a feature representation of query images (called a \"signature\") and then use these signatures to match against a database of template image signatures with the aid of a distance metric. In this paper, we propose a novel approach for license plate Re-Id inspired by Zero Shot Learning. The core idea is to generate template signatures for retrieval purposes from a multi-hot text encoding of license plates instead of their images. The proposed method maps license plate images and their license plate numbers to a common embedding space using a Symmetric Triplet loss function so that an image can be queried against its text. In effect, our approach makes it possible to identify license plates whose images have never been seen before, using a large text database of license plate numbers. We show that our system is capable of highly accurate and fast re-identification of license plates, and its performance compares favorably to both OCR-based approaches as well as state of the art image-based Re-ID approaches. In addition to the advantages of avoiding manual image labeling and the ease of creating signature databases, the minimal time and storage requirements enable our system to be deployed even on portable devices.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123675623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fashion Attributes-to-Image Synthesis Using Attention-Based Generative Adversarial Network 基于注意力生成对抗网络的时尚属性-图像合成
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00055
Hanbit Lee, Sang-goo Lee
In this paper, we present a method to generate fashion product images those are consistent with a given set of fashion attributes. Since distinct fashion attributes are related to different local sub-regions of a product image, we propose to use generative adversarial network with attentional discriminator. The attribute-attended loss signal from discriminator leads generator to generate more consistent images with given attributes. In addition, we present a generator based on Product-of-Gaussian to encode the composition of fashion attributes in effective way. To verify the proposed model whether it generates consistent image, an oracle attribute classifier is trained and judge the consistency of given attributes and the generated images. Our model significantly outperforms the baseline model in terms of correctness measured by the pre-trained oracle classifier. We show not only qualitative performance but also synthesized images with various combinations of attributes, so we can compare them with baseline model.
在本文中,我们提出了一种方法来生成与给定的时尚属性集一致的时尚产品图像。由于不同的时尚属性与产品图像的不同局部子区域相关,我们建议使用带有注意鉴别器的生成对抗网络。来自鉴别器的属性伴随的丢失信号引导生成器生成与给定属性更一致的图像。此外,我们还提出了一种基于高斯乘积的生成器,以有效地对时尚属性的组合进行编码。为了验证所提出的模型是否生成一致的图像,训练oracle属性分类器并判断给定属性与生成图像的一致性。我们的模型在通过预训练的oracle分类器测量的正确性方面明显优于基线模型。我们不仅展示了定性性能,还展示了具有各种属性组合的合成图像,以便与基线模型进行比较。
{"title":"Fashion Attributes-to-Image Synthesis Using Attention-Based Generative Adversarial Network","authors":"Hanbit Lee, Sang-goo Lee","doi":"10.1109/WACV.2019.00055","DOIUrl":"https://doi.org/10.1109/WACV.2019.00055","url":null,"abstract":"In this paper, we present a method to generate fashion product images those are consistent with a given set of fashion attributes. Since distinct fashion attributes are related to different local sub-regions of a product image, we propose to use generative adversarial network with attentional discriminator. The attribute-attended loss signal from discriminator leads generator to generate more consistent images with given attributes. In addition, we present a generator based on Product-of-Gaussian to encode the composition of fashion attributes in effective way. To verify the proposed model whether it generates consistent image, an oracle attribute classifier is trained and judge the consistency of given attributes and the generated images. Our model significantly outperforms the baseline model in terms of correctness measured by the pre-trained oracle classifier. We show not only qualitative performance but also synthesized images with various combinations of attributes, so we can compare them with baseline model.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128605221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Digging Deeper Into Egocentric Gaze Prediction 深入挖掘以自我为中心的凝视预测
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00035
H. R. Tavakoli, Esa Rahtu, Juho Kannala, A. Borji
This paper digs deeper into factors that influence egocentric gaze. Instead of training deep models for this purpose in a blind manner, we propose to inspect factors that contribute to gaze guidance during daily tasks. Bottom-up saliency and optical flow are assessed versus strong spatial prior baselines. Task-specific cues such as vanishing point, manipulation point, and hand regions are analyzed as representatives of top-down information. We also look into the contribution of these factors by investigating a simple recurrent neural model for ego-centric gaze prediction. First, deep features are extracted for all input video frames. Then, a gated recurrent unit is employed to integrate information over time and to predict the next fixation. We propose an integrated model that combines the recurrent model with several top-down and bottom-up cues. Extensive experiments over multiple datasets reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up attention models perform poorly in predicting gaze and underperform spatial biases, (3) deep features perform better compared to traditional features, (4) as opposed to hand regions, the manipulation point is a strong influential cue for gaze prediction, (5) combining the proposed recurrent model with bottom-up cues, vanishing points and, in particular, manipulation point results in the best gaze prediction accuracy over egocentric videos, (6) the knowledge transfer works best for cases where the tasks or sequences are similar, and (7) task and activity recognition can benefit from gaze prediction. Our findings suggest that (1) there should be more emphasis on hand-object interaction and (2) the egocentric vision community should consider larger datasets including diverse stimuli and more subjects.
本文深入探讨了影响自我中心凝视的因素。我们建议在日常任务中检查有助于凝视引导的因素,而不是以盲目的方式为此目的训练深度模型。自下而上的显著性和光流对比强空间先验基线进行评估。任务特定的线索,如消失点、操作点和手区域作为自上而下信息的代表进行分析。我们还通过研究一个简单的以自我为中心的凝视预测的循环神经模型来研究这些因素的贡献。首先,提取所有输入视频帧的深度特征。然后,使用门控循环单元随时间整合信息并预测下一次固定。我们提出了一个综合模型,将循环模型与几个自上而下和自下而上的线索结合起来。在多个数据集上进行的大量实验表明:(1)空间偏差在自我中心视频中很强;(2)自下而上的注意模型在预测凝视方面表现不佳,空间偏差表现不佳;(3)深度特征比传统特征表现更好;(4)相对于手部区域,操作点是一个对凝视预测有很强影响的线索;(5)将所提出的循环模型与自下而上的线索、消失点、特别是(6)在任务或序列相似的情况下,知识转移效果最好;(7)注视预测有利于任务和活动的识别。我们的研究结果表明:(1)应该更加重视手-物交互;(2)自我中心视觉社区应该考虑更大的数据集,包括不同的刺激和更多的受试者。
{"title":"Digging Deeper Into Egocentric Gaze Prediction","authors":"H. R. Tavakoli, Esa Rahtu, Juho Kannala, A. Borji","doi":"10.1109/WACV.2019.00035","DOIUrl":"https://doi.org/10.1109/WACV.2019.00035","url":null,"abstract":"This paper digs deeper into factors that influence egocentric gaze. Instead of training deep models for this purpose in a blind manner, we propose to inspect factors that contribute to gaze guidance during daily tasks. Bottom-up saliency and optical flow are assessed versus strong spatial prior baselines. Task-specific cues such as vanishing point, manipulation point, and hand regions are analyzed as representatives of top-down information. We also look into the contribution of these factors by investigating a simple recurrent neural model for ego-centric gaze prediction. First, deep features are extracted for all input video frames. Then, a gated recurrent unit is employed to integrate information over time and to predict the next fixation. We propose an integrated model that combines the recurrent model with several top-down and bottom-up cues. Extensive experiments over multiple datasets reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up attention models perform poorly in predicting gaze and underperform spatial biases, (3) deep features perform better compared to traditional features, (4) as opposed to hand regions, the manipulation point is a strong influential cue for gaze prediction, (5) combining the proposed recurrent model with bottom-up cues, vanishing points and, in particular, manipulation point results in the best gaze prediction accuracy over egocentric videos, (6) the knowledge transfer works best for cases where the tasks or sequences are similar, and (7) task and activity recognition can benefit from gaze prediction. Our findings suggest that (1) there should be more emphasis on hand-object interaction and (2) the egocentric vision community should consider larger datasets including diverse stimuli and more subjects.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126496953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Skeleton-Based Action Recognition of People Handling Objects 基于骨骼的人处理物体的动作识别
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00014
Sunoh Kim, Kimin Yun, Jongyoul Park, J. Choi
In visual surveillance systems, it is necessary to recognize the behavior of people handling objects such as a phone, a cup, or a plastic bag. In this paper, to address this problem, we propose a new framework for recognizing object-related human actions by graph convolutional networks using human and object poses. In this framework, we construct skeletal graphs of reliable human poses by selectively sampling the informative frames in a video, which include human joints with high confidence scores obtained in pose estimation. The skeletal graphs generated from the sampled frames represent human poses related to the object position in both the spatial and temporal domains, and these graphs are used as inputs to the graph convolutional networks. Through experiments over an open benchmark and our own data sets, we verify the validity of our framework in that our method outperforms the state-of-the-art method for skeleton-based action recognition.
在视觉监控系统中,有必要识别人们处理手机、杯子或塑料袋等物体的行为。在本文中,为了解决这个问题,我们提出了一个新的框架,通过使用人和物体姿态的图卷积网络来识别与物体相关的人类行为。在这个框架中,我们通过选择性地采样视频中的信息帧来构建可靠的人体姿势骨骼图,其中包括在姿势估计中获得高置信度分数的人体关节。从采样帧生成的骨架图表示与空间和时间域中的物体位置相关的人体姿势,这些图被用作图卷积网络的输入。通过在开放基准和我们自己的数据集上的实验,我们验证了我们框架的有效性,因为我们的方法优于基于骨架的最先进的动作识别方法。
{"title":"Skeleton-Based Action Recognition of People Handling Objects","authors":"Sunoh Kim, Kimin Yun, Jongyoul Park, J. Choi","doi":"10.1109/WACV.2019.00014","DOIUrl":"https://doi.org/10.1109/WACV.2019.00014","url":null,"abstract":"In visual surveillance systems, it is necessary to recognize the behavior of people handling objects such as a phone, a cup, or a plastic bag. In this paper, to address this problem, we propose a new framework for recognizing object-related human actions by graph convolutional networks using human and object poses. In this framework, we construct skeletal graphs of reliable human poses by selectively sampling the informative frames in a video, which include human joints with high confidence scores obtained in pose estimation. The skeletal graphs generated from the sampled frames represent human poses related to the object position in both the spatial and temporal domains, and these graphs are used as inputs to the graph convolutional networks. Through experiments over an open benchmark and our own data sets, we verify the validity of our framework in that our method outperforms the state-of-the-art method for skeleton-based action recognition.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115387490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
期刊
2019 IEEE Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1