首页 > 最新文献

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
Semantic Correspondence in the Wild 野外的语义对应
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00126
Akila Pemasiri, Kien Nguyen Thanh, S. Sridharan, C. Fookes
Semantic correspondence estimation where the object instances depicted are deformed extensively from one instance to the next is a challenging problem in computer vision that has received much attention. Unfortunately, all existing approaches require prior knowledge of the object classes which are present in the image environment. This is an unwanted restriction as it can prevent the establishment of semantic correspondence across object classes in wild conditions when it is uncertain which classes will be of interest. In contrast, in this paper we formulate the semantic correspondence estimation task as a key point detection process in which image-to-class classification and image-to-image correspondence are solved simultaneously. Identifying object classes within the same framework to establish correspondence, increases this approach's applicability in real world scenarios. The use of object regions in the process also enhances the accuracy while constraining the search space, thus improving overall efficiency. This new approach is compared with the state-of-the-art on publicly available datasets to validate its capability for improved semantic correspondence estimation in wild conditions.
语义对应估计是计算机视觉中一个备受关注的具有挑战性的问题,其中所描述的对象实例从一个实例到另一个实例之间存在广泛的变形。不幸的是,所有现有的方法都需要事先了解图像环境中存在的对象类。这是一个不必要的限制,因为在不确定哪些类将感兴趣的情况下,它可能会阻止在对象类之间建立语义对应。相反,本文将语义对应估计任务作为一个关键点检测过程,同时解决图像到类的分类和图像到图像的对应问题。在同一框架内识别对象类以建立对应关系,增加了这种方法在现实场景中的适用性。在此过程中对目标区域的使用也在限制搜索空间的同时提高了精度,从而提高了整体效率。将这种新方法与最新的公开可用数据集进行比较,以验证其在野外条件下改进语义对应估计的能力。
{"title":"Semantic Correspondence in the Wild","authors":"Akila Pemasiri, Kien Nguyen Thanh, S. Sridharan, C. Fookes","doi":"10.1109/WACV.2019.00126","DOIUrl":"https://doi.org/10.1109/WACV.2019.00126","url":null,"abstract":"Semantic correspondence estimation where the object instances depicted are deformed extensively from one instance to the next is a challenging problem in computer vision that has received much attention. Unfortunately, all existing approaches require prior knowledge of the object classes which are present in the image environment. This is an unwanted restriction as it can prevent the establishment of semantic correspondence across object classes in wild conditions when it is uncertain which classes will be of interest. In contrast, in this paper we formulate the semantic correspondence estimation task as a key point detection process in which image-to-class classification and image-to-image correspondence are solved simultaneously. Identifying object classes within the same framework to establish correspondence, increases this approach's applicability in real world scenarios. The use of object regions in the process also enhances the accuracy while constraining the search space, thus improving overall efficiency. This new approach is compared with the state-of-the-art on publicly available datasets to validate its capability for improved semantic correspondence estimation in wild conditions.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122637904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Ventral-Dorsal Neural Networks: Object Detection Via Selective Attention 腹背神经网络:通过选择性注意进行目标检测
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00110
M. K. Ebrahimpour, Jiayun Li, Yen-Yun Yu, Jackson Reesee, Azadeh Moghtaderi, Ming-Hsuan Yang, D. Noelle
Deep Convolutional Neural Networks (CNNs) have been repeatedly proven to perform well on image classification tasks. Object detection methods, however, are still in need of significant improvements. In this paper, we propose a new framework called Ventral-Dorsal Networks (VDNets) which is inspired by the structure of the human visual system. Roughly, the visual input signal is analyzed along two separate neural streams, one in the temporal lobe and the other in the parietal lobe. The coarse functional distinction between these streams is between object recognition — the "what" of the signal - and extracting location related information — the "where" of the signal. The ventral pathway from primary visual cortex, entering the temporal lobe, is dominated by "what" information, while the dorsal pathway, into the parietal lobe, is dominated by "where" information. Inspired by this structure, we propose the integration of a "Ventral Network" and a "Dorsal Network", which are complementary. Information about object identity can guide localization, and location information can guide attention to relevant image regions, improving object recognition. This new dual network framework sharpens the focus of object detection. Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches on PASCAL VOC 2007 by 8% (mAP) and PASCAL VOC 2012 by 3% (mAP). Moreover, a comparison of techniques on Yearbook images displays substantial qualitative and quantitative benefits of VDNet.
深度卷积神经网络(cnn)已被多次证明在图像分类任务中表现良好。然而,目标检测方法仍然需要重大的改进。在本文中,我们提出了一个新的框架,称为腹-背网络(VDNets),它的灵感来自于人类视觉系统的结构。粗略地说,视觉输入信号沿着两个独立的神经流进行分析,一个在颞叶,另一个在顶叶。这些流之间的粗略功能区别在于对象识别(信号的“内容”)和提取位置相关信息(信号的“位置”)。从初级视觉皮层进入颞叶的腹侧通路主要由“什么”信息主导,而进入顶叶的背侧通路主要由“在哪里”信息主导。受这种结构的启发,我们提出了“腹侧网络”和“背侧网络”的整合,这是互补的。物体身份信息可以指导定位,位置信息可以引导注意力到相关图像区域,提高物体识别。这种新的双网络框架使目标检测的重点更加突出。我们的实验结果表明,该方法在PASCAL VOC 2007 (mAP)和PASCAL VOC 2012 (mAP)上的性能分别比目前最先进的目标检测方法高出8%和3%。此外,年鉴图像技术的比较显示了VDNet在质量和数量上的巨大优势。
{"title":"Ventral-Dorsal Neural Networks: Object Detection Via Selective Attention","authors":"M. K. Ebrahimpour, Jiayun Li, Yen-Yun Yu, Jackson Reesee, Azadeh Moghtaderi, Ming-Hsuan Yang, D. Noelle","doi":"10.1109/WACV.2019.00110","DOIUrl":"https://doi.org/10.1109/WACV.2019.00110","url":null,"abstract":"Deep Convolutional Neural Networks (CNNs) have been repeatedly proven to perform well on image classification tasks. Object detection methods, however, are still in need of significant improvements. In this paper, we propose a new framework called Ventral-Dorsal Networks (VDNets) which is inspired by the structure of the human visual system. Roughly, the visual input signal is analyzed along two separate neural streams, one in the temporal lobe and the other in the parietal lobe. The coarse functional distinction between these streams is between object recognition — the \"what\" of the signal - and extracting location related information — the \"where\" of the signal. The ventral pathway from primary visual cortex, entering the temporal lobe, is dominated by \"what\" information, while the dorsal pathway, into the parietal lobe, is dominated by \"where\" information. Inspired by this structure, we propose the integration of a \"Ventral Network\" and a \"Dorsal Network\", which are complementary. Information about object identity can guide localization, and location information can guide attention to relevant image regions, improving object recognition. This new dual network framework sharpens the focus of object detection. Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches on PASCAL VOC 2007 by 8% (mAP) and PASCAL VOC 2012 by 3% (mAP). Moreover, a comparison of techniques on Yearbook images displays substantial qualitative and quantitative benefits of VDNet.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115023324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Local Color Mapping Combined with Color Transfer for Underwater Image Enhancement 局部色彩映射与色彩转移相结合用于水下图像增强
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00157
R. Protasiuk, Adel Bibi, Bernard Ghanem
Color correction and color transfer methods have gained a lot of attention in the past few years to circumvent color degradation that may occur due to various sources. In this paper, we propose a novel simple yet powerful strategy to profoundly enhance color distorted underwater images. The proposed approach combines both local and global information through a simple yet powerful affine transform model. Local and global information are carried through local color mapping and color covariance mapping between an input and some reference source, respectively. Several experiments on degraded underwater images demonstrate that the proposed method performs favourably to all other methods including ones that are tailored to correcting underwater images by explicit noise modelling.
色彩校正和色彩转移方法在过去几年中获得了很多关注,以避免由于各种来源可能发生的色彩退化。在本文中,我们提出了一种新颖的简单而强大的策略来深度增强彩色失真的水下图像。该方法通过一个简单而强大的仿射变换模型结合了局部和全局信息。局部和全局信息分别通过输入和参考源之间的局部颜色映射和颜色协方差映射来传递。在退化的水下图像上进行的一些实验表明,该方法优于所有其他方法,包括那些通过显式噪声建模来校正水下图像的方法。
{"title":"Local Color Mapping Combined with Color Transfer for Underwater Image Enhancement","authors":"R. Protasiuk, Adel Bibi, Bernard Ghanem","doi":"10.1109/WACV.2019.00157","DOIUrl":"https://doi.org/10.1109/WACV.2019.00157","url":null,"abstract":"Color correction and color transfer methods have gained a lot of attention in the past few years to circumvent color degradation that may occur due to various sources. In this paper, we propose a novel simple yet powerful strategy to profoundly enhance color distorted underwater images. The proposed approach combines both local and global information through a simple yet powerful affine transform model. Local and global information are carried through local color mapping and color covariance mapping between an input and some reference source, respectively. Several experiments on degraded underwater images demonstrate that the proposed method performs favourably to all other methods including ones that are tailored to correcting underwater images by explicit noise modelling.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134480154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Zero Shot License Plate Re-Identification 零射击车牌重新识别
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00087
Mayank Gupta, Abhinav Kumar, S. Madhvanath
The problem of person, vehicle or license plate reidentification is generally treated as a multi-shot image retrieval problem. The objective of these tasks is to learn a feature representation of query images (called a "signature") and then use these signatures to match against a database of template image signatures with the aid of a distance metric. In this paper, we propose a novel approach for license plate Re-Id inspired by Zero Shot Learning. The core idea is to generate template signatures for retrieval purposes from a multi-hot text encoding of license plates instead of their images. The proposed method maps license plate images and their license plate numbers to a common embedding space using a Symmetric Triplet loss function so that an image can be queried against its text. In effect, our approach makes it possible to identify license plates whose images have never been seen before, using a large text database of license plate numbers. We show that our system is capable of highly accurate and fast re-identification of license plates, and its performance compares favorably to both OCR-based approaches as well as state of the art image-based Re-ID approaches. In addition to the advantages of avoiding manual image labeling and the ease of creating signature databases, the minimal time and storage requirements enable our system to be deployed even on portable devices.
人、车辆或车牌的再识别问题通常被视为一个多镜头图像检索问题。这些任务的目标是学习查询图像的特征表示(称为“签名”),然后使用这些签名在距离度量的帮助下与模板图像签名数据库进行匹配。本文提出了一种受Zero Shot Learning启发的车牌Re-Id新方法。其核心思想是从车牌的多热文本编码而不是其图像中生成用于检索目的的模板签名。该方法使用对称三重损失函数将车牌图像及其车牌号码映射到公共嵌入空间,从而可以根据图像的文本查询图像。实际上,我们的方法可以使用大型车牌号码文本数据库来识别以前从未见过的车牌图像。我们表明,我们的系统能够高度准确和快速地重新识别车牌,其性能优于基于ocr的方法以及最先进的基于图像的重新识别方法。除了避免手动图像标记和易于创建特征数据库的优点外,最小的时间和存储要求使我们的系统甚至可以部署在便携式设备上。
{"title":"Zero Shot License Plate Re-Identification","authors":"Mayank Gupta, Abhinav Kumar, S. Madhvanath","doi":"10.1109/WACV.2019.00087","DOIUrl":"https://doi.org/10.1109/WACV.2019.00087","url":null,"abstract":"The problem of person, vehicle or license plate reidentification is generally treated as a multi-shot image retrieval problem. The objective of these tasks is to learn a feature representation of query images (called a \"signature\") and then use these signatures to match against a database of template image signatures with the aid of a distance metric. In this paper, we propose a novel approach for license plate Re-Id inspired by Zero Shot Learning. The core idea is to generate template signatures for retrieval purposes from a multi-hot text encoding of license plates instead of their images. The proposed method maps license plate images and their license plate numbers to a common embedding space using a Symmetric Triplet loss function so that an image can be queried against its text. In effect, our approach makes it possible to identify license plates whose images have never been seen before, using a large text database of license plate numbers. We show that our system is capable of highly accurate and fast re-identification of license plates, and its performance compares favorably to both OCR-based approaches as well as state of the art image-based Re-ID approaches. In addition to the advantages of avoiding manual image labeling and the ease of creating signature databases, the minimal time and storage requirements enable our system to be deployed even on portable devices.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123675623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
HiBsteR: Hierarchical Boosted Deep Metric Learning for Image Retrieval HiBsteR:用于图像检索的分层增强深度度量学习
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00069
Georg Waltner, M. Opitz, Horst Possegger, H. Bischof
When the number of categories is growing into thousands, large-scale image retrieval becomes an increasingly hard task. Retrieval accuracy can be improved by learning distance metric methods that separate categories in a transformed embedding space. Unlike most methods that utilize a single embedding to learn a distance metric, we build on the idea of boosted metric learning, where an embedding is split into a boosted ensemble of embeddings. While in general metric learning is directly applied on fine labels to learn embeddings, we take this one step further and incorporate hierarchical label information into the boosting framework and show how to properly adapt loss functions for this purpose. We show that by introducing several sub-embeddings which focus on specific hierarchical classes, the retrieval accuracy can be improved compared to standard flat label embeddings. The proposed method is especially suitable for exploiting hierarchical datasets or when additional labels can be retrieved without much effort. Our approach improves R@1 over state-of-the-art methods on the biggest available retrieval dataset (Stanford Online Products) and sets new reference baselines for hierarchical metric learning on several other datasets (CUB-200-2011, VegFru, FruitVeg-81). We show that the clustering quality in terms of NMI score is superior to previous works.
当分类数量增加到数千个时,大规模图像检索成为一项越来越困难的任务。通过学习在转换后的嵌入空间中分离类别的距离度量方法,可以提高检索精度。与大多数利用单个嵌入来学习距离度量的方法不同,我们建立在增强度量学习的思想之上,其中嵌入被分割成一个增强的嵌入集合。虽然通常度量学习直接应用于精细标签来学习嵌入,但我们更进一步,将分层标签信息合并到增强框架中,并展示了如何为此目的适当地调整损失函数。研究表明,与标准平面标签嵌入相比,通过引入几个专注于特定层次类的子嵌入可以提高检索精度。所提出的方法特别适用于利用分层数据集或当额外的标签可以不费力地检索时。我们的方法在最大的可用检索数据集(斯坦福在线产品)上改进了R@1,并为其他几个数据集(CUB-200-2011, VegFru, FruitVeg-81)上的分层度量学习设置了新的参考基线。我们表明,在NMI得分方面的聚类质量优于以往的工作。
{"title":"HiBsteR: Hierarchical Boosted Deep Metric Learning for Image Retrieval","authors":"Georg Waltner, M. Opitz, Horst Possegger, H. Bischof","doi":"10.1109/WACV.2019.00069","DOIUrl":"https://doi.org/10.1109/WACV.2019.00069","url":null,"abstract":"When the number of categories is growing into thousands, large-scale image retrieval becomes an increasingly hard task. Retrieval accuracy can be improved by learning distance metric methods that separate categories in a transformed embedding space. Unlike most methods that utilize a single embedding to learn a distance metric, we build on the idea of boosted metric learning, where an embedding is split into a boosted ensemble of embeddings. While in general metric learning is directly applied on fine labels to learn embeddings, we take this one step further and incorporate hierarchical label information into the boosting framework and show how to properly adapt loss functions for this purpose. We show that by introducing several sub-embeddings which focus on specific hierarchical classes, the retrieval accuracy can be improved compared to standard flat label embeddings. The proposed method is especially suitable for exploiting hierarchical datasets or when additional labels can be retrieved without much effort. Our approach improves R@1 over state-of-the-art methods on the biggest available retrieval dataset (Stanford Online Products) and sets new reference baselines for hierarchical metric learning on several other datasets (CUB-200-2011, VegFru, FruitVeg-81). We show that the clustering quality in terms of NMI score is superior to previous works.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124992277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Automatic Detection and Segmentation of Lentil Crop Breeding Plots From Multi-Spectral Images Captured by UAV-Mounted Camera 基于无人机多光谱图像的扁豆作物育种小区自动检测与分割
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00183
Imran Ahmed, M. Eramian, I. Ovsyannikov, William van der Kamp, K. Nielsen, H. Duddu, Arafia Rumali, S. Shirtliffe, K. Bett
Unmanned Aerial Vehicles (UAVs) paired with image detection and segmentation techniques can be used to extract plant phenotype information of individual breeding or research plots. Each plot contains plants of a single genetic line. Breeders are interested in selecting lines with preferred phenotypes (physical traits) that increase crop yield or resilience. Automated detection and segmentation of plots would enable automatic monitoring and quantification of plot phenotypes, allowing a faster selection process that requires much fewer person-hours compared with manual assessment. A detection algorithm based on Laplacian of Gaussian (LoG) blob detection and a segmentation algorithm based on a combination of unsupervised clustering and random walker image segmentation are proposed to detect and segment lentil plots from multi-spectral aerial images. Our algorithm detects and segments lentil plots from normalized difference vegetative index (NDVI) images. The detection algorithm exhibited an average precision and recall of 96.3% and 97.2% respectively. The average Dice similarity coefficient between a detected segmented plot and its ground truth was 0.906.
无人机与图像检测和分割技术相结合,可用于提取单个育种或研究地块的植物表型信息。每个地块包含单一遗传系的植物。育种者有兴趣选择具有提高作物产量或抗逆性的首选表型(物理性状)的品系。地块的自动检测和分割将实现地块表型的自动监测和定量,与人工评估相比,允许更快的选择过程所需的工时大大减少。提出了一种基于拉普拉斯高斯(LoG)斑点检测的检测算法和一种基于无监督聚类和随机行走图像分割相结合的分割算法,对多光谱航空图像中的扁豆地块进行检测和分割。该算法从归一化差异植被指数(NDVI)图像中检测和分割扁豆图。检测算法的平均准确率和召回率分别为96.3%和97.2%。检测到的分割图与其ground truth之间的平均Dice相似系数为0.906。
{"title":"Automatic Detection and Segmentation of Lentil Crop Breeding Plots From Multi-Spectral Images Captured by UAV-Mounted Camera","authors":"Imran Ahmed, M. Eramian, I. Ovsyannikov, William van der Kamp, K. Nielsen, H. Duddu, Arafia Rumali, S. Shirtliffe, K. Bett","doi":"10.1109/WACV.2019.00183","DOIUrl":"https://doi.org/10.1109/WACV.2019.00183","url":null,"abstract":"Unmanned Aerial Vehicles (UAVs) paired with image detection and segmentation techniques can be used to extract plant phenotype information of individual breeding or research plots. Each plot contains plants of a single genetic line. Breeders are interested in selecting lines with preferred phenotypes (physical traits) that increase crop yield or resilience. Automated detection and segmentation of plots would enable automatic monitoring and quantification of plot phenotypes, allowing a faster selection process that requires much fewer person-hours compared with manual assessment. A detection algorithm based on Laplacian of Gaussian (LoG) blob detection and a segmentation algorithm based on a combination of unsupervised clustering and random walker image segmentation are proposed to detect and segment lentil plots from multi-spectral aerial images. Our algorithm detects and segments lentil plots from normalized difference vegetative index (NDVI) images. The detection algorithm exhibited an average precision and recall of 96.3% and 97.2% respectively. The average Dice similarity coefficient between a detected segmented plot and its ground truth was 0.906.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121515103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Skeleton-Based Action Recognition of People Handling Objects 基于骨骼的人处理物体的动作识别
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00014
Sunoh Kim, Kimin Yun, Jongyoul Park, J. Choi
In visual surveillance systems, it is necessary to recognize the behavior of people handling objects such as a phone, a cup, or a plastic bag. In this paper, to address this problem, we propose a new framework for recognizing object-related human actions by graph convolutional networks using human and object poses. In this framework, we construct skeletal graphs of reliable human poses by selectively sampling the informative frames in a video, which include human joints with high confidence scores obtained in pose estimation. The skeletal graphs generated from the sampled frames represent human poses related to the object position in both the spatial and temporal domains, and these graphs are used as inputs to the graph convolutional networks. Through experiments over an open benchmark and our own data sets, we verify the validity of our framework in that our method outperforms the state-of-the-art method for skeleton-based action recognition.
在视觉监控系统中,有必要识别人们处理手机、杯子或塑料袋等物体的行为。在本文中,为了解决这个问题,我们提出了一个新的框架,通过使用人和物体姿态的图卷积网络来识别与物体相关的人类行为。在这个框架中,我们通过选择性地采样视频中的信息帧来构建可靠的人体姿势骨骼图,其中包括在姿势估计中获得高置信度分数的人体关节。从采样帧生成的骨架图表示与空间和时间域中的物体位置相关的人体姿势,这些图被用作图卷积网络的输入。通过在开放基准和我们自己的数据集上的实验,我们验证了我们框架的有效性,因为我们的方法优于基于骨架的最先进的动作识别方法。
{"title":"Skeleton-Based Action Recognition of People Handling Objects","authors":"Sunoh Kim, Kimin Yun, Jongyoul Park, J. Choi","doi":"10.1109/WACV.2019.00014","DOIUrl":"https://doi.org/10.1109/WACV.2019.00014","url":null,"abstract":"In visual surveillance systems, it is necessary to recognize the behavior of people handling objects such as a phone, a cup, or a plastic bag. In this paper, to address this problem, we propose a new framework for recognizing object-related human actions by graph convolutional networks using human and object poses. In this framework, we construct skeletal graphs of reliable human poses by selectively sampling the informative frames in a video, which include human joints with high confidence scores obtained in pose estimation. The skeletal graphs generated from the sampled frames represent human poses related to the object position in both the spatial and temporal domains, and these graphs are used as inputs to the graph convolutional networks. Through experiments over an open benchmark and our own data sets, we verify the validity of our framework in that our method outperforms the state-of-the-art method for skeleton-based action recognition.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115387490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Fashion Attributes-to-Image Synthesis Using Attention-Based Generative Adversarial Network 基于注意力生成对抗网络的时尚属性-图像合成
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00055
Hanbit Lee, Sang-goo Lee
In this paper, we present a method to generate fashion product images those are consistent with a given set of fashion attributes. Since distinct fashion attributes are related to different local sub-regions of a product image, we propose to use generative adversarial network with attentional discriminator. The attribute-attended loss signal from discriminator leads generator to generate more consistent images with given attributes. In addition, we present a generator based on Product-of-Gaussian to encode the composition of fashion attributes in effective way. To verify the proposed model whether it generates consistent image, an oracle attribute classifier is trained and judge the consistency of given attributes and the generated images. Our model significantly outperforms the baseline model in terms of correctness measured by the pre-trained oracle classifier. We show not only qualitative performance but also synthesized images with various combinations of attributes, so we can compare them with baseline model.
在本文中,我们提出了一种方法来生成与给定的时尚属性集一致的时尚产品图像。由于不同的时尚属性与产品图像的不同局部子区域相关,我们建议使用带有注意鉴别器的生成对抗网络。来自鉴别器的属性伴随的丢失信号引导生成器生成与给定属性更一致的图像。此外,我们还提出了一种基于高斯乘积的生成器,以有效地对时尚属性的组合进行编码。为了验证所提出的模型是否生成一致的图像,训练oracle属性分类器并判断给定属性与生成图像的一致性。我们的模型在通过预训练的oracle分类器测量的正确性方面明显优于基线模型。我们不仅展示了定性性能,还展示了具有各种属性组合的合成图像,以便与基线模型进行比较。
{"title":"Fashion Attributes-to-Image Synthesis Using Attention-Based Generative Adversarial Network","authors":"Hanbit Lee, Sang-goo Lee","doi":"10.1109/WACV.2019.00055","DOIUrl":"https://doi.org/10.1109/WACV.2019.00055","url":null,"abstract":"In this paper, we present a method to generate fashion product images those are consistent with a given set of fashion attributes. Since distinct fashion attributes are related to different local sub-regions of a product image, we propose to use generative adversarial network with attentional discriminator. The attribute-attended loss signal from discriminator leads generator to generate more consistent images with given attributes. In addition, we present a generator based on Product-of-Gaussian to encode the composition of fashion attributes in effective way. To verify the proposed model whether it generates consistent image, an oracle attribute classifier is trained and judge the consistency of given attributes and the generated images. Our model significantly outperforms the baseline model in terms of correctness measured by the pre-trained oracle classifier. We show not only qualitative performance but also synthesized images with various combinations of attributes, so we can compare them with baseline model.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128605221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Good Similar Patches for Image Denoising 良好的相似补丁图像去噪
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00205
Si Lu
Patch-based denoising algorithms like BM3D have achieved outstanding performance. An important idea for the success of these methods is to exploit the recurrence of similar patches in an input image to estimate the underlying image structures. However, in these algorithms, the similar patches used for denoising are obtained via Nearest Neighbour Search (NNS) and are sometimes not optimal. First, due to the existence of noise, NNS can select similar patches with similar noise patterns to the reference patch. Second, the unreliable noisy pixels in digital images can bring a bias to the patch searching process and result in a loss of color fidelity in the final denoising result. We observe that given a set of good similar patches, their distribution is not necessarily centered at the noisy reference patch and can be approximated by a Gaussian component. Based on this observation, we present a patch searching method that clusters similar patch candidates into patch groups using Gaussian Mixture Model-based clustering, and selects the patch group that contains the reference patch as the final patches for denoising. We also use an unreliable pixel estimation algorithm to pre-process the input noisy images to further improve the patch searching. Our experiments show that our approach can better capture the underlying patch structures and can consistently enable the state-of-the-art patch-based denoising algorithms, such as BM3D, LPCA and PLOW, to better denoise images by providing them with patches found by our approach while without modifying these algorithms.
BM3D等基于补丁的去噪算法已经取得了出色的性能。这些方法成功的一个重要思想是利用输入图像中相似斑块的重复来估计底层图像结构。然而,在这些算法中,用于去噪的相似补丁是通过最近邻搜索(NNS)获得的,有时不是最优的。首先,由于噪声的存在,神经网络可以选择与参考patch具有相似噪声模式的相似patch。其次,数字图像中不可靠的噪声像素会给patch搜索过程带来偏差,导致最终去噪结果的色彩保真度下降。我们观察到,给定一组良好的相似斑块,它们的分布不一定以有噪声的参考斑块为中心,可以用高斯分量来近似。在此基础上,我们提出了一种基于高斯混合模型聚类相似候选补丁的补丁搜索方法,并选择包含参考补丁的补丁组作为最终补丁进行去噪。我们还使用了一种不可靠的像素估计算法对输入的噪声图像进行预处理,以进一步提高patch搜索的效率。我们的实验表明,我们的方法可以更好地捕获底层斑块结构,并且可以一致地使最先进的基于斑块的去噪算法,如BM3D, LPCA和PLOW,通过提供我们的方法找到的斑块来更好地去噪图像,而无需修改这些算法。
{"title":"Good Similar Patches for Image Denoising","authors":"Si Lu","doi":"10.1109/WACV.2019.00205","DOIUrl":"https://doi.org/10.1109/WACV.2019.00205","url":null,"abstract":"Patch-based denoising algorithms like BM3D have achieved outstanding performance. An important idea for the success of these methods is to exploit the recurrence of similar patches in an input image to estimate the underlying image structures. However, in these algorithms, the similar patches used for denoising are obtained via Nearest Neighbour Search (NNS) and are sometimes not optimal. First, due to the existence of noise, NNS can select similar patches with similar noise patterns to the reference patch. Second, the unreliable noisy pixels in digital images can bring a bias to the patch searching process and result in a loss of color fidelity in the final denoising result. We observe that given a set of good similar patches, their distribution is not necessarily centered at the noisy reference patch and can be approximated by a Gaussian component. Based on this observation, we present a patch searching method that clusters similar patch candidates into patch groups using Gaussian Mixture Model-based clustering, and selects the patch group that contains the reference patch as the final patches for denoising. We also use an unreliable pixel estimation algorithm to pre-process the input noisy images to further improve the patch searching. Our experiments show that our approach can better capture the underlying patch structures and can consistently enable the state-of-the-art patch-based denoising algorithms, such as BM3D, LPCA and PLOW, to better denoise images by providing them with patches found by our approach while without modifying these algorithms.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131207651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
CNN-Based Semantic Segmentation Using Level Set Loss 基于cnn的水平集损失语义分割
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00191
Youngeun Kim, Seunghyeon Kim, Taekyung Kim, Changick Kim
Thesedays, Convolutional Neural Networks are widely used in semantic segmentation. However, since CNN-based segmentation networks produce low-resolution outputs with rich semantic information, it is inevitable that spatial details (e.g., small objects and fine boundary information) of segmentation results will be lost. To address this problem, motivated by a variational approach to image segmentation (i.e., level set theory), we propose a novel loss function called the level set loss which is designed to refine spatial details of segmentation results. To deal with multiple classes in an image, we first decompose the ground truth into binary images. Note that each binary image consists of background and regions belonging to a class. Then we convert level set functions into class probability maps and calculate the energy for each class. The network is trained to minimize the weighted sum of the level set loss and the cross-entropy loss. The proposed level set loss improves the spatial details of segmentation results in a time and memory efficient way. Furthermore, our experimental results show that the proposed loss function achieves better performance than previous approaches.
目前,卷积神经网络在语义分割中得到了广泛的应用。然而,由于基于cnn的分割网络产生的是语义信息丰富的低分辨率输出,分割结果的空间细节(如小物体和精细边界信息)不可避免地会丢失。为了解决这个问题,在变分图像分割方法(即水平集理论)的激励下,我们提出了一种新的损失函数,称为水平集损失,旨在细化分割结果的空间细节。为了处理图像中的多个类,我们首先将基本真值分解为二值图像。请注意,每个二值图像由属于一个类的背景和区域组成。然后将水平集函数转换为类概率图,计算每个类的能量。该网络被训练成最小化水平集损失和交叉熵损失的加权和。所提出的水平集损失以一种节省时间和内存的方式改善了分割结果的空间细节。此外,我们的实验结果表明,所提出的损失函数比以前的方法取得了更好的性能。
{"title":"CNN-Based Semantic Segmentation Using Level Set Loss","authors":"Youngeun Kim, Seunghyeon Kim, Taekyung Kim, Changick Kim","doi":"10.1109/WACV.2019.00191","DOIUrl":"https://doi.org/10.1109/WACV.2019.00191","url":null,"abstract":"Thesedays, Convolutional Neural Networks are widely used in semantic segmentation. However, since CNN-based segmentation networks produce low-resolution outputs with rich semantic information, it is inevitable that spatial details (e.g., small objects and fine boundary information) of segmentation results will be lost. To address this problem, motivated by a variational approach to image segmentation (i.e., level set theory), we propose a novel loss function called the level set loss which is designed to refine spatial details of segmentation results. To deal with multiple classes in an image, we first decompose the ground truth into binary images. Note that each binary image consists of background and regions belonging to a class. Then we convert level set functions into class probability maps and calculate the energy for each class. The network is trained to minimize the weighted sum of the level set loss and the cross-entropy loss. The proposed level set loss improves the spatial details of segmentation results in a time and memory efficient way. Furthermore, our experimental results show that the proposed loss function achieves better performance than previous approaches.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130354915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
期刊
2019 IEEE Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1