J. D. Stets, Zhengqin Li, J. Frisvad, Manmohan Chandraker
The appearance of a transparent object is determined by a combination of refraction and reflection, as governed by a complex function of its shape as well as the surrounding environment. Prior works on 3D reconstruction have largely ignored transparent objects due to this challenge, yet they occur frequently in real-world scenes. This paper presents an approach to estimate depths and normals for transparent objects using a single image acquired under a distant but otherwise arbitrary environment map. In particular, we use a deep convolutional neural network (CNN) for this task. Unlike opaque objects, it is challenging to acquire ground truth training data for refractive objects, thus, we propose to use a large-scale synthetic dataset. To accurately capture the image formation process, we use a physically-based renderer. We demonstrate that a CNN trained on our dataset learns to reconstruct shape and estimate segmentation boundaries for transparent objects using a single image, while also achieving generalization to real images at test time. In experiments, we extensively study the properties of our dataset and compare to baselines demonstrating its utility.
{"title":"Single-Shot Analysis of Refractive Shape Using Convolutional Neural Networks","authors":"J. D. Stets, Zhengqin Li, J. Frisvad, Manmohan Chandraker","doi":"10.1109/WACV.2019.00111","DOIUrl":"https://doi.org/10.1109/WACV.2019.00111","url":null,"abstract":"The appearance of a transparent object is determined by a combination of refraction and reflection, as governed by a complex function of its shape as well as the surrounding environment. Prior works on 3D reconstruction have largely ignored transparent objects due to this challenge, yet they occur frequently in real-world scenes. This paper presents an approach to estimate depths and normals for transparent objects using a single image acquired under a distant but otherwise arbitrary environment map. In particular, we use a deep convolutional neural network (CNN) for this task. Unlike opaque objects, it is challenging to acquire ground truth training data for refractive objects, thus, we propose to use a large-scale synthetic dataset. To accurately capture the image formation process, we use a physically-based renderer. We demonstrate that a CNN trained on our dataset learns to reconstruct shape and estimate segmentation boundaries for transparent objects using a single image, while also achieving generalization to real images at test time. In experiments, we extensively study the properties of our dataset and compare to baselines demonstrating its utility.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125143321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Color correction and color transfer methods have gained a lot of attention in the past few years to circumvent color degradation that may occur due to various sources. In this paper, we propose a novel simple yet powerful strategy to profoundly enhance color distorted underwater images. The proposed approach combines both local and global information through a simple yet powerful affine transform model. Local and global information are carried through local color mapping and color covariance mapping between an input and some reference source, respectively. Several experiments on degraded underwater images demonstrate that the proposed method performs favourably to all other methods including ones that are tailored to correcting underwater images by explicit noise modelling.
{"title":"Local Color Mapping Combined with Color Transfer for Underwater Image Enhancement","authors":"R. Protasiuk, Adel Bibi, Bernard Ghanem","doi":"10.1109/WACV.2019.00157","DOIUrl":"https://doi.org/10.1109/WACV.2019.00157","url":null,"abstract":"Color correction and color transfer methods have gained a lot of attention in the past few years to circumvent color degradation that may occur due to various sources. In this paper, we propose a novel simple yet powerful strategy to profoundly enhance color distorted underwater images. The proposed approach combines both local and global information through a simple yet powerful affine transform model. Local and global information are carried through local color mapping and color covariance mapping between an input and some reference source, respectively. Several experiments on degraded underwater images demonstrate that the proposed method performs favourably to all other methods including ones that are tailored to correcting underwater images by explicit noise modelling.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134480154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Georg Waltner, M. Opitz, Horst Possegger, H. Bischof
When the number of categories is growing into thousands, large-scale image retrieval becomes an increasingly hard task. Retrieval accuracy can be improved by learning distance metric methods that separate categories in a transformed embedding space. Unlike most methods that utilize a single embedding to learn a distance metric, we build on the idea of boosted metric learning, where an embedding is split into a boosted ensemble of embeddings. While in general metric learning is directly applied on fine labels to learn embeddings, we take this one step further and incorporate hierarchical label information into the boosting framework and show how to properly adapt loss functions for this purpose. We show that by introducing several sub-embeddings which focus on specific hierarchical classes, the retrieval accuracy can be improved compared to standard flat label embeddings. The proposed method is especially suitable for exploiting hierarchical datasets or when additional labels can be retrieved without much effort. Our approach improves R@1 over state-of-the-art methods on the biggest available retrieval dataset (Stanford Online Products) and sets new reference baselines for hierarchical metric learning on several other datasets (CUB-200-2011, VegFru, FruitVeg-81). We show that the clustering quality in terms of NMI score is superior to previous works.
{"title":"HiBsteR: Hierarchical Boosted Deep Metric Learning for Image Retrieval","authors":"Georg Waltner, M. Opitz, Horst Possegger, H. Bischof","doi":"10.1109/WACV.2019.00069","DOIUrl":"https://doi.org/10.1109/WACV.2019.00069","url":null,"abstract":"When the number of categories is growing into thousands, large-scale image retrieval becomes an increasingly hard task. Retrieval accuracy can be improved by learning distance metric methods that separate categories in a transformed embedding space. Unlike most methods that utilize a single embedding to learn a distance metric, we build on the idea of boosted metric learning, where an embedding is split into a boosted ensemble of embeddings. While in general metric learning is directly applied on fine labels to learn embeddings, we take this one step further and incorporate hierarchical label information into the boosting framework and show how to properly adapt loss functions for this purpose. We show that by introducing several sub-embeddings which focus on specific hierarchical classes, the retrieval accuracy can be improved compared to standard flat label embeddings. The proposed method is especially suitable for exploiting hierarchical datasets or when additional labels can be retrieved without much effort. Our approach improves R@1 over state-of-the-art methods on the biggest available retrieval dataset (Stanford Online Products) and sets new reference baselines for hierarchical metric learning on several other datasets (CUB-200-2011, VegFru, FruitVeg-81). We show that the clustering quality in terms of NMI score is superior to previous works.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124992277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Youngeun Kim, Seunghyeon Kim, Taekyung Kim, Changick Kim
Thesedays, Convolutional Neural Networks are widely used in semantic segmentation. However, since CNN-based segmentation networks produce low-resolution outputs with rich semantic information, it is inevitable that spatial details (e.g., small objects and fine boundary information) of segmentation results will be lost. To address this problem, motivated by a variational approach to image segmentation (i.e., level set theory), we propose a novel loss function called the level set loss which is designed to refine spatial details of segmentation results. To deal with multiple classes in an image, we first decompose the ground truth into binary images. Note that each binary image consists of background and regions belonging to a class. Then we convert level set functions into class probability maps and calculate the energy for each class. The network is trained to minimize the weighted sum of the level set loss and the cross-entropy loss. The proposed level set loss improves the spatial details of segmentation results in a time and memory efficient way. Furthermore, our experimental results show that the proposed loss function achieves better performance than previous approaches.
{"title":"CNN-Based Semantic Segmentation Using Level Set Loss","authors":"Youngeun Kim, Seunghyeon Kim, Taekyung Kim, Changick Kim","doi":"10.1109/WACV.2019.00191","DOIUrl":"https://doi.org/10.1109/WACV.2019.00191","url":null,"abstract":"Thesedays, Convolutional Neural Networks are widely used in semantic segmentation. However, since CNN-based segmentation networks produce low-resolution outputs with rich semantic information, it is inevitable that spatial details (e.g., small objects and fine boundary information) of segmentation results will be lost. To address this problem, motivated by a variational approach to image segmentation (i.e., level set theory), we propose a novel loss function called the level set loss which is designed to refine spatial details of segmentation results. To deal with multiple classes in an image, we first decompose the ground truth into binary images. Note that each binary image consists of background and regions belonging to a class. Then we convert level set functions into class probability maps and calculate the energy for each class. The network is trained to minimize the weighted sum of the level set loss and the cross-entropy loss. The proposed level set loss improves the spatial details of segmentation results in a time and memory efficient way. Furthermore, our experimental results show that the proposed loss function achieves better performance than previous approaches.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130354915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Imran Ahmed, M. Eramian, I. Ovsyannikov, William van der Kamp, K. Nielsen, H. Duddu, Arafia Rumali, S. Shirtliffe, K. Bett
Unmanned Aerial Vehicles (UAVs) paired with image detection and segmentation techniques can be used to extract plant phenotype information of individual breeding or research plots. Each plot contains plants of a single genetic line. Breeders are interested in selecting lines with preferred phenotypes (physical traits) that increase crop yield or resilience. Automated detection and segmentation of plots would enable automatic monitoring and quantification of plot phenotypes, allowing a faster selection process that requires much fewer person-hours compared with manual assessment. A detection algorithm based on Laplacian of Gaussian (LoG) blob detection and a segmentation algorithm based on a combination of unsupervised clustering and random walker image segmentation are proposed to detect and segment lentil plots from multi-spectral aerial images. Our algorithm detects and segments lentil plots from normalized difference vegetative index (NDVI) images. The detection algorithm exhibited an average precision and recall of 96.3% and 97.2% respectively. The average Dice similarity coefficient between a detected segmented plot and its ground truth was 0.906.
{"title":"Automatic Detection and Segmentation of Lentil Crop Breeding Plots From Multi-Spectral Images Captured by UAV-Mounted Camera","authors":"Imran Ahmed, M. Eramian, I. Ovsyannikov, William van der Kamp, K. Nielsen, H. Duddu, Arafia Rumali, S. Shirtliffe, K. Bett","doi":"10.1109/WACV.2019.00183","DOIUrl":"https://doi.org/10.1109/WACV.2019.00183","url":null,"abstract":"Unmanned Aerial Vehicles (UAVs) paired with image detection and segmentation techniques can be used to extract plant phenotype information of individual breeding or research plots. Each plot contains plants of a single genetic line. Breeders are interested in selecting lines with preferred phenotypes (physical traits) that increase crop yield or resilience. Automated detection and segmentation of plots would enable automatic monitoring and quantification of plot phenotypes, allowing a faster selection process that requires much fewer person-hours compared with manual assessment. A detection algorithm based on Laplacian of Gaussian (LoG) blob detection and a segmentation algorithm based on a combination of unsupervised clustering and random walker image segmentation are proposed to detect and segment lentil plots from multi-spectral aerial images. Our algorithm detects and segments lentil plots from normalized difference vegetative index (NDVI) images. The detection algorithm exhibited an average precision and recall of 96.3% and 97.2% respectively. The average Dice similarity coefficient between a detected segmented plot and its ground truth was 0.906.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121515103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The problem of person, vehicle or license plate reidentification is generally treated as a multi-shot image retrieval problem. The objective of these tasks is to learn a feature representation of query images (called a "signature") and then use these signatures to match against a database of template image signatures with the aid of a distance metric. In this paper, we propose a novel approach for license plate Re-Id inspired by Zero Shot Learning. The core idea is to generate template signatures for retrieval purposes from a multi-hot text encoding of license plates instead of their images. The proposed method maps license plate images and their license plate numbers to a common embedding space using a Symmetric Triplet loss function so that an image can be queried against its text. In effect, our approach makes it possible to identify license plates whose images have never been seen before, using a large text database of license plate numbers. We show that our system is capable of highly accurate and fast re-identification of license plates, and its performance compares favorably to both OCR-based approaches as well as state of the art image-based Re-ID approaches. In addition to the advantages of avoiding manual image labeling and the ease of creating signature databases, the minimal time and storage requirements enable our system to be deployed even on portable devices.
{"title":"Zero Shot License Plate Re-Identification","authors":"Mayank Gupta, Abhinav Kumar, S. Madhvanath","doi":"10.1109/WACV.2019.00087","DOIUrl":"https://doi.org/10.1109/WACV.2019.00087","url":null,"abstract":"The problem of person, vehicle or license plate reidentification is generally treated as a multi-shot image retrieval problem. The objective of these tasks is to learn a feature representation of query images (called a \"signature\") and then use these signatures to match against a database of template image signatures with the aid of a distance metric. In this paper, we propose a novel approach for license plate Re-Id inspired by Zero Shot Learning. The core idea is to generate template signatures for retrieval purposes from a multi-hot text encoding of license plates instead of their images. The proposed method maps license plate images and their license plate numbers to a common embedding space using a Symmetric Triplet loss function so that an image can be queried against its text. In effect, our approach makes it possible to identify license plates whose images have never been seen before, using a large text database of license plate numbers. We show that our system is capable of highly accurate and fast re-identification of license plates, and its performance compares favorably to both OCR-based approaches as well as state of the art image-based Re-ID approaches. In addition to the advantages of avoiding manual image labeling and the ease of creating signature databases, the minimal time and storage requirements enable our system to be deployed even on portable devices.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123675623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a method to generate fashion product images those are consistent with a given set of fashion attributes. Since distinct fashion attributes are related to different local sub-regions of a product image, we propose to use generative adversarial network with attentional discriminator. The attribute-attended loss signal from discriminator leads generator to generate more consistent images with given attributes. In addition, we present a generator based on Product-of-Gaussian to encode the composition of fashion attributes in effective way. To verify the proposed model whether it generates consistent image, an oracle attribute classifier is trained and judge the consistency of given attributes and the generated images. Our model significantly outperforms the baseline model in terms of correctness measured by the pre-trained oracle classifier. We show not only qualitative performance but also synthesized images with various combinations of attributes, so we can compare them with baseline model.
{"title":"Fashion Attributes-to-Image Synthesis Using Attention-Based Generative Adversarial Network","authors":"Hanbit Lee, Sang-goo Lee","doi":"10.1109/WACV.2019.00055","DOIUrl":"https://doi.org/10.1109/WACV.2019.00055","url":null,"abstract":"In this paper, we present a method to generate fashion product images those are consistent with a given set of fashion attributes. Since distinct fashion attributes are related to different local sub-regions of a product image, we propose to use generative adversarial network with attentional discriminator. The attribute-attended loss signal from discriminator leads generator to generate more consistent images with given attributes. In addition, we present a generator based on Product-of-Gaussian to encode the composition of fashion attributes in effective way. To verify the proposed model whether it generates consistent image, an oracle attribute classifier is trained and judge the consistency of given attributes and the generated images. Our model significantly outperforms the baseline model in terms of correctness measured by the pre-trained oracle classifier. We show not only qualitative performance but also synthesized images with various combinations of attributes, so we can compare them with baseline model.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128605221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper digs deeper into factors that influence egocentric gaze. Instead of training deep models for this purpose in a blind manner, we propose to inspect factors that contribute to gaze guidance during daily tasks. Bottom-up saliency and optical flow are assessed versus strong spatial prior baselines. Task-specific cues such as vanishing point, manipulation point, and hand regions are analyzed as representatives of top-down information. We also look into the contribution of these factors by investigating a simple recurrent neural model for ego-centric gaze prediction. First, deep features are extracted for all input video frames. Then, a gated recurrent unit is employed to integrate information over time and to predict the next fixation. We propose an integrated model that combines the recurrent model with several top-down and bottom-up cues. Extensive experiments over multiple datasets reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up attention models perform poorly in predicting gaze and underperform spatial biases, (3) deep features perform better compared to traditional features, (4) as opposed to hand regions, the manipulation point is a strong influential cue for gaze prediction, (5) combining the proposed recurrent model with bottom-up cues, vanishing points and, in particular, manipulation point results in the best gaze prediction accuracy over egocentric videos, (6) the knowledge transfer works best for cases where the tasks or sequences are similar, and (7) task and activity recognition can benefit from gaze prediction. Our findings suggest that (1) there should be more emphasis on hand-object interaction and (2) the egocentric vision community should consider larger datasets including diverse stimuli and more subjects.
{"title":"Digging Deeper Into Egocentric Gaze Prediction","authors":"H. R. Tavakoli, Esa Rahtu, Juho Kannala, A. Borji","doi":"10.1109/WACV.2019.00035","DOIUrl":"https://doi.org/10.1109/WACV.2019.00035","url":null,"abstract":"This paper digs deeper into factors that influence egocentric gaze. Instead of training deep models for this purpose in a blind manner, we propose to inspect factors that contribute to gaze guidance during daily tasks. Bottom-up saliency and optical flow are assessed versus strong spatial prior baselines. Task-specific cues such as vanishing point, manipulation point, and hand regions are analyzed as representatives of top-down information. We also look into the contribution of these factors by investigating a simple recurrent neural model for ego-centric gaze prediction. First, deep features are extracted for all input video frames. Then, a gated recurrent unit is employed to integrate information over time and to predict the next fixation. We propose an integrated model that combines the recurrent model with several top-down and bottom-up cues. Extensive experiments over multiple datasets reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up attention models perform poorly in predicting gaze and underperform spatial biases, (3) deep features perform better compared to traditional features, (4) as opposed to hand regions, the manipulation point is a strong influential cue for gaze prediction, (5) combining the proposed recurrent model with bottom-up cues, vanishing points and, in particular, manipulation point results in the best gaze prediction accuracy over egocentric videos, (6) the knowledge transfer works best for cases where the tasks or sequences are similar, and (7) task and activity recognition can benefit from gaze prediction. Our findings suggest that (1) there should be more emphasis on hand-object interaction and (2) the egocentric vision community should consider larger datasets including diverse stimuli and more subjects.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126496953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In visual surveillance systems, it is necessary to recognize the behavior of people handling objects such as a phone, a cup, or a plastic bag. In this paper, to address this problem, we propose a new framework for recognizing object-related human actions by graph convolutional networks using human and object poses. In this framework, we construct skeletal graphs of reliable human poses by selectively sampling the informative frames in a video, which include human joints with high confidence scores obtained in pose estimation. The skeletal graphs generated from the sampled frames represent human poses related to the object position in both the spatial and temporal domains, and these graphs are used as inputs to the graph convolutional networks. Through experiments over an open benchmark and our own data sets, we verify the validity of our framework in that our method outperforms the state-of-the-art method for skeleton-based action recognition.
{"title":"Skeleton-Based Action Recognition of People Handling Objects","authors":"Sunoh Kim, Kimin Yun, Jongyoul Park, J. Choi","doi":"10.1109/WACV.2019.00014","DOIUrl":"https://doi.org/10.1109/WACV.2019.00014","url":null,"abstract":"In visual surveillance systems, it is necessary to recognize the behavior of people handling objects such as a phone, a cup, or a plastic bag. In this paper, to address this problem, we propose a new framework for recognizing object-related human actions by graph convolutional networks using human and object poses. In this framework, we construct skeletal graphs of reliable human poses by selectively sampling the informative frames in a video, which include human joints with high confidence scores obtained in pose estimation. The skeletal graphs generated from the sampled frames represent human poses related to the object position in both the spatial and temporal domains, and these graphs are used as inputs to the graph convolutional networks. Through experiments over an open benchmark and our own data sets, we verify the validity of our framework in that our method outperforms the state-of-the-art method for skeleton-based action recognition.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115387490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}