Random forest is a well-known and widely-used machine learning model. In many applications where the training data arise from real-world sources, there may be labeling errors in the data. In spite of its superior performance, the basic model of random forest dose not consider potential label noise in learning, and thus its performance can suffer significantly in the presence of label noise. In order to solve this problem, we present a new variation of random forest - a novel learning approach that leads to an improved noise robust random forest (NRRF) model. We incorporate the noise information by introducing a global multi-class noise tolerant loss function into the training of the classic random forest model. This new loss function was found to significantly boost the performance of random forest. We evaluated the proposed NRRF by extensive experiments of classification tasks on standard machine learning/computer vision datasets like MNIST, letter and Cifar10. The proposed NRRF produced very promising results under a wide range of noise settings.
{"title":"Improving Robustness of Random Forest Under Label Noise","authors":"Xu Zhou, Pak Lun Kevin Ding, Baoxin Li","doi":"10.1109/WACV.2019.00106","DOIUrl":"https://doi.org/10.1109/WACV.2019.00106","url":null,"abstract":"Random forest is a well-known and widely-used machine learning model. In many applications where the training data arise from real-world sources, there may be labeling errors in the data. In spite of its superior performance, the basic model of random forest dose not consider potential label noise in learning, and thus its performance can suffer significantly in the presence of label noise. In order to solve this problem, we present a new variation of random forest - a novel learning approach that leads to an improved noise robust random forest (NRRF) model. We incorporate the noise information by introducing a global multi-class noise tolerant loss function into the training of the classic random forest model. This new loss function was found to significantly boost the performance of random forest. We evaluated the proposed NRRF by extensive experiments of classification tasks on standard machine learning/computer vision datasets like MNIST, letter and Cifar10. The proposed NRRF produced very promising results under a wide range of noise settings.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121062251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frederik Pahde, M. Puscas, Jannik Wolff, T. Klein, N. Sebe, Moin Nabi
Since the advent of deep learning, neural networks have demonstrated remarkable results in many visual recognition tasks, constantly pushing the limits. However, the state-of-the-art approaches are largely unsuitable in scarce data regimes. To address this shortcoming, this paper proposes employing a 3D model, which is derived from training images. Such a model can then be used to hallucinate novel viewpoints and poses for the scarce samples of the few-shot learning scenario. A self-paced learning approach allows for the selection of a diverse set of high-quality images, which facilitates the training of a classifier. The performance of the proposed approach is showcased on the fine-grained CUB-200-2011 dataset in a few-shot setting and significantly improves our baseline accuracy.
{"title":"Low-Shot Learning From Imaginary 3D Model","authors":"Frederik Pahde, M. Puscas, Jannik Wolff, T. Klein, N. Sebe, Moin Nabi","doi":"10.1109/WACV.2019.00109","DOIUrl":"https://doi.org/10.1109/WACV.2019.00109","url":null,"abstract":"Since the advent of deep learning, neural networks have demonstrated remarkable results in many visual recognition tasks, constantly pushing the limits. However, the state-of-the-art approaches are largely unsuitable in scarce data regimes. To address this shortcoming, this paper proposes employing a 3D model, which is derived from training images. Such a model can then be used to hallucinate novel viewpoints and poses for the scarce samples of the few-shot learning scenario. A self-paced learning approach allows for the selection of a diverse set of high-quality images, which facilitates the training of a classifier. The performance of the proposed approach is showcased on the fine-grained CUB-200-2011 dataset in a few-shot setting and significantly improves our baseline accuracy.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"343 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133154137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the light of the human studies that report a strong correlation between head circumference and body size, we propose a new research problem: head-body matching. Given an image of a person's head, we want to match it with his body (headless) image. We propose a dual-pathway framework which computes head and body discriminating features independently, and learns the correlation between such features. We introduce a comprehensive evaluation of our proposed framework for this problem using different features including anthropometric features and deep-CNN features, different experimental setting such as head-body scale variations, and different body parts. We demonstrate the usefulness of our framework with two novel applications: head/body recognition, and T-shirt sizing from a head image. Our evaluations for head/body recognition application on the challenging large scale PIPA dataset (contains high variations of pose, viewpoint, and occlusion) show up to 53% of performance improvement using deep-CNN features, over the global model features in which head and body features are not separated or correlated. For T-shirt sizing application, we use anthropometric features for head-body matching. We achieve promising experimental results on small and challenging datasets.
{"title":"Which Body Is Mine?","authors":"M. R. Sayed, T. Sim, Joo-Hwee Lim, K. Ma","doi":"10.1109/WACV.2019.00093","DOIUrl":"https://doi.org/10.1109/WACV.2019.00093","url":null,"abstract":"In the light of the human studies that report a strong correlation between head circumference and body size, we propose a new research problem: head-body matching. Given an image of a person's head, we want to match it with his body (headless) image. We propose a dual-pathway framework which computes head and body discriminating features independently, and learns the correlation between such features. We introduce a comprehensive evaluation of our proposed framework for this problem using different features including anthropometric features and deep-CNN features, different experimental setting such as head-body scale variations, and different body parts. We demonstrate the usefulness of our framework with two novel applications: head/body recognition, and T-shirt sizing from a head image. Our evaluations for head/body recognition application on the challenging large scale PIPA dataset (contains high variations of pose, viewpoint, and occlusion) show up to 53% of performance improvement using deep-CNN features, over the global model features in which head and body features are not separated or correlated. For T-shirt sizing application, we use anthropometric features for head-body matching. We achieve promising experimental results on small and challenging datasets.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134132407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Active learning algorithms automatically identify the salient and informative samples from large amounts of unlabeled data and tremendously reduce human annotation effort in inducing a machine learning model. In a multi-class classification problem, however, the human oracle has to provide the precise category label of each unlabeled sample to be annotated. In an application with a significantly large (and possibly unknown) number of classes (such as object recognition), providing the exact class label may be time consuming and error prone. In this paper, we propose a novel active learning framework where the annotator merely needs to identify which of the selected n categories a given unlabeled sample belongs to (where n is much smaller than the actual number of classes). We pose the active sample selection as an NP-hard integer quadratic programming problem and exploit the Iterative Truncated Power algorithm to derive an efficient solution. To the best of our knowledge, this is the first research effort to propose a generic n-ary query framework for active sample selection. Our extensive empirical results on six challenging vision datasets (from four different application domains and varied number of classes ranging from 10 to 369) corroborate the potential of the framework in further reducing human annotation effort in real-world active learning applications.
{"title":"Active Learning with n-ary Queries for Image Recognition","authors":"Aditya R. Bhattacharya, Shayok Chakraborty","doi":"10.1109/WACV.2019.00090","DOIUrl":"https://doi.org/10.1109/WACV.2019.00090","url":null,"abstract":"Active learning algorithms automatically identify the salient and informative samples from large amounts of unlabeled data and tremendously reduce human annotation effort in inducing a machine learning model. In a multi-class classification problem, however, the human oracle has to provide the precise category label of each unlabeled sample to be annotated. In an application with a significantly large (and possibly unknown) number of classes (such as object recognition), providing the exact class label may be time consuming and error prone. In this paper, we propose a novel active learning framework where the annotator merely needs to identify which of the selected n categories a given unlabeled sample belongs to (where n is much smaller than the actual number of classes). We pose the active sample selection as an NP-hard integer quadratic programming problem and exploit the Iterative Truncated Power algorithm to derive an efficient solution. To the best of our knowledge, this is the first research effort to propose a generic n-ary query framework for active sample selection. Our extensive empirical results on six challenging vision datasets (from four different application domains and varied number of classes ranging from 10 to 369) corroborate the potential of the framework in further reducing human annotation effort in real-world active learning applications.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133652925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Zhao, Xinwei Sun, Xiaopeng Hong, Y. Yao, Yizhou Wang
Zero-shot learning (ZSL) which aims to learn new concepts without any labeled training data is a promising solution to large-scale concept learning. Recently, many works implement zero-shot learning by transferring structural knowledge from the semantic embedding space to the image feature space. However, we observe that such direct knowledge transfer may suffer from the space shift problem in the form of the inconsistency of geometric structures in the training and testing spaces. To alleviate this problem, we propose a novel method which actualizes recurrent knowledge transfer (RecKT) between the two spaces. Specifically, we unite the two spaces into the joint embedding space in which unseen image data are missing. The proposed method provides a synthesis-refinement mechanism to learn the shared subspace structure (SSS) and synthesize missing data simultaneously in the joint embedding space. The synthesized unseen image data are utilized to construct the classifier for unseen classes. Experimental results show that our method outperforms the state-of-the-art on three popular datasets. The ablation experiment and visualization of the learning process illustrate how our method can alleviate the space shift problem. By product, our method provides a perspective to interpret the ZSL performance by implementing subspace clustering on the learned SSS.
{"title":"Zero-Shot Learning Via Recurrent Knowledge Transfer","authors":"Bo Zhao, Xinwei Sun, Xiaopeng Hong, Y. Yao, Yizhou Wang","doi":"10.1109/WACV.2019.00144","DOIUrl":"https://doi.org/10.1109/WACV.2019.00144","url":null,"abstract":"Zero-shot learning (ZSL) which aims to learn new concepts without any labeled training data is a promising solution to large-scale concept learning. Recently, many works implement zero-shot learning by transferring structural knowledge from the semantic embedding space to the image feature space. However, we observe that such direct knowledge transfer may suffer from the space shift problem in the form of the inconsistency of geometric structures in the training and testing spaces. To alleviate this problem, we propose a novel method which actualizes recurrent knowledge transfer (RecKT) between the two spaces. Specifically, we unite the two spaces into the joint embedding space in which unseen image data are missing. The proposed method provides a synthesis-refinement mechanism to learn the shared subspace structure (SSS) and synthesize missing data simultaneously in the joint embedding space. The synthesized unseen image data are utilized to construct the classifier for unseen classes. Experimental results show that our method outperforms the state-of-the-art on three popular datasets. The ablation experiment and visualization of the learning process illustrate how our method can alleviate the space shift problem. By product, our method provides a perspective to interpret the ZSL performance by implementing subspace clustering on the learned SSS.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115069616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We explore the use of a knowledge graphs, that capture general or commonsense knowledge, to augment the information extracted from images by the state-of-the-art methods for image captioning. We compare the performance of image captioning systems that as measured by CIDEr-D, a performance measure that is explicitly designed for evaluating image captioning systems, on several benchmark data sets such as MS COCO. The results of our experiments show that the variants of the state-of-the-art methods for image captioning that make use of the information extracted from knowledge graphs can substantially outperform those that rely solely on the information extracted from images.
{"title":"Improving Image Captioning by Leveraging Knowledge Graphs","authors":"Yimin Zhou, Yiwei Sun, Vasant G Honavar","doi":"10.1109/WACV.2019.00036","DOIUrl":"https://doi.org/10.1109/WACV.2019.00036","url":null,"abstract":"We explore the use of a knowledge graphs, that capture general or commonsense knowledge, to augment the information extracted from images by the state-of-the-art methods for image captioning. We compare the performance of image captioning systems that as measured by CIDEr-D, a performance measure that is explicitly designed for evaluating image captioning systems, on several benchmark data sets such as MS COCO. The results of our experiments show that the variants of the state-of-the-art methods for image captioning that make use of the information extracted from knowledge graphs can substantially outperform those that rely solely on the information extracted from images.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"322 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132775719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kratika Garg, N. Ramakrishnan, Alok Prakash, T. Srikanthan, Punit Bhatt
Elimination of moving shadows is an essential step to achieve accurate vehicle detection and localization in automated traffic surveillance systems that aim to detect vehicles on road scenes captured by surveillance cameras. However, this is still a challenging problem as existing pixel based methods miss parts of vehicles and region-based methods, while accurate, incur higher computations. In this paper, we propose a highly accurate yet low-complexity block-based moving shadow elimination technique, which can effectively deal with varying shadow conditions. A novel shadow elimination pipeline is proposed that employs computationally lean features to quickly classify distinct vehicles from shadows, and uses a more sophisticated interior edge feature only for classification of difficult scenarios. Extensive evaluations on freely available and self-collected datasets demonstrate that the proposed technique achieves higher accuracy than other state-of-the-art techniques in varying scenarios. Additionally, it also achieves over 20 times speedup on a low-cost embedded platform, Odroid XU-4, over a state-of-the-art technique that achieves comparable accuracy. Experimental results confirm the realtime capability of the proposed approach while achieving robustness to varying shadow scenarios.
{"title":"Rapid Technique to Eliminate Moving Shadows for Accurate Vehicle Detection","authors":"Kratika Garg, N. Ramakrishnan, Alok Prakash, T. Srikanthan, Punit Bhatt","doi":"10.1109/WACV.2019.00214","DOIUrl":"https://doi.org/10.1109/WACV.2019.00214","url":null,"abstract":"Elimination of moving shadows is an essential step to achieve accurate vehicle detection and localization in automated traffic surveillance systems that aim to detect vehicles on road scenes captured by surveillance cameras. However, this is still a challenging problem as existing pixel based methods miss parts of vehicles and region-based methods, while accurate, incur higher computations. In this paper, we propose a highly accurate yet low-complexity block-based moving shadow elimination technique, which can effectively deal with varying shadow conditions. A novel shadow elimination pipeline is proposed that employs computationally lean features to quickly classify distinct vehicles from shadows, and uses a more sophisticated interior edge feature only for classification of difficult scenarios. Extensive evaluations on freely available and self-collected datasets demonstrate that the proposed technique achieves higher accuracy than other state-of-the-art techniques in varying scenarios. Additionally, it also achieves over 20 times speedup on a low-cost embedded platform, Odroid XU-4, over a state-of-the-art technique that achieves comparable accuracy. Experimental results confirm the realtime capability of the proposed approach while achieving robustness to varying shadow scenarios.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128993795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Still image emotion recognition has been receiving increasing attention in recent years due to the tremendous amount of social media content available on the Web. Opinion mining, visual emotion analysis, search and retrieval are among the application areas, to name a few. While there exist works on the subject, offering methods to detect image sentiment; i.e. recognizing the polarity of the image, less efforts focus on emotion analysis; i.e. dealing with recognizing the exact emotion aroused when exposed to certain visual stimuli. Main gaps tackled in this work include (1) lack of large-scale image datasets for deep learning of visual emotions and (2) lack of context-sensitive single-modality approaches in emotion analysis in the still image domain. In this paper, we introduce LUCFER (Pronounced LU-CI-FER), a dataset containing over 3.6M images, with 3-dimensional labels; i.e. emotion, context and valence. LUCFER, the largest dataset of the kind currently available, is collected using a novel data collection pipeline, proposed and implemented in this work. Moreover, we train a context-sensitive deep classifier using a novel multinomial classification technique proposed here via adding a dimensionality reduction layer to the CNN. Relying on our categorical approach to emotion recognition, we claim and show empirically that injecting context to our unified training process helps (1) achieve a more balanced precision and recall, and (2) boost performance, yielding an overall classification accuracy of 73.12% compared to 58.3% achieved in the closest work in the literature.
{"title":"LUCFER: A Large-Scale Context-Sensitive Image Dataset for Deep Learning of Visual Emotions","authors":"Pooyan Balouchian, M. Safaei, H. Foroosh","doi":"10.1109/WACV.2019.00180","DOIUrl":"https://doi.org/10.1109/WACV.2019.00180","url":null,"abstract":"Still image emotion recognition has been receiving increasing attention in recent years due to the tremendous amount of social media content available on the Web. Opinion mining, visual emotion analysis, search and retrieval are among the application areas, to name a few. While there exist works on the subject, offering methods to detect image sentiment; i.e. recognizing the polarity of the image, less efforts focus on emotion analysis; i.e. dealing with recognizing the exact emotion aroused when exposed to certain visual stimuli. Main gaps tackled in this work include (1) lack of large-scale image datasets for deep learning of visual emotions and (2) lack of context-sensitive single-modality approaches in emotion analysis in the still image domain. In this paper, we introduce LUCFER (Pronounced LU-CI-FER), a dataset containing over 3.6M images, with 3-dimensional labels; i.e. emotion, context and valence. LUCFER, the largest dataset of the kind currently available, is collected using a novel data collection pipeline, proposed and implemented in this work. Moreover, we train a context-sensitive deep classifier using a novel multinomial classification technique proposed here via adding a dimensionality reduction layer to the CNN. Relying on our categorical approach to emotion recognition, we claim and show empirically that injecting context to our unified training process helps (1) achieve a more balanced precision and recall, and (2) boost performance, yielding an overall classification accuracy of 73.12% compared to 58.3% achieved in the closest work in the literature.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121312296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a new thermal empowered multi-task network (TEMT-Net) to improve facial action unit detection. Our primary goal is to leverage the situation that the training set has multi-modality data while the application scenario only has one modality. Thermal images are robust to illumination and face color. In the proposed multi-task framework, we utilize both modality data. Action unit detection and facial landmark detection are correlated tasks. To utilize the advantage and the correlation of different modalities and different tasks, we propose a novel thermal empowered multi-task deep neural network learning approach for action unit detection, facial landmark detection and thermal image reconstruction simultaneously. The thermal image generator and facial landmark detection provide regularization on the learned features with shared factors as the input color images. Extensive experiments are conducted on the BP4D and MMSE databases, with the comparison to the state of the art methods. The experiments show that the multi-modality framework improves the AU detection significantly.
{"title":"Multi-Modality Empowered Network for Facial Action Unit Detection","authors":"Peng Liu, Zheng Zhang, Huiyuan Yang, L. Yin","doi":"10.1109/WACV.2019.00235","DOIUrl":"https://doi.org/10.1109/WACV.2019.00235","url":null,"abstract":"This paper presents a new thermal empowered multi-task network (TEMT-Net) to improve facial action unit detection. Our primary goal is to leverage the situation that the training set has multi-modality data while the application scenario only has one modality. Thermal images are robust to illumination and face color. In the proposed multi-task framework, we utilize both modality data. Action unit detection and facial landmark detection are correlated tasks. To utilize the advantage and the correlation of different modalities and different tasks, we propose a novel thermal empowered multi-task deep neural network learning approach for action unit detection, facial landmark detection and thermal image reconstruction simultaneously. The thermal image generator and facial landmark detection provide regularization on the learned features with shared factors as the input color images. Extensive experiments are conducted on the BP4D and MMSE databases, with the comparison to the state of the art methods. The experiments show that the multi-modality framework improves the AU detection significantly.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117240441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large-scale annotation of image segmentation datasets is often prohibitively expensive, as it usually requires a huge number of worker hours to obtain high-quality results. Abundant and reliable data has been, however, crucial for the advances on image understanding tasks recently achieved by deep learning models. In this paper, we introduce FreeLabel, an intuitive open-source web interface that allows users to obtain high-quality segmentation masks with just a few freehand scribbles, in a matter of seconds. The efficacy of FreeLabel is quantitatively demonstrated by experimental results on the PASCAL dataset as well as on a dataset from the agricultural domain. Designed to benefit the computer vision community, FreeLabel can be used for both crowdsourced or private annotation and has a modular structure that can be easily adapted for any image dataset.
{"title":"FreeLabel: A Publicly Available Annotation Tool Based on Freehand Traces","authors":"P. Dias, Zhou Shen, A. Tabb, Henry Medeiros","doi":"10.1109/WACV.2019.00010","DOIUrl":"https://doi.org/10.1109/WACV.2019.00010","url":null,"abstract":"Large-scale annotation of image segmentation datasets is often prohibitively expensive, as it usually requires a huge number of worker hours to obtain high-quality results. Abundant and reliable data has been, however, crucial for the advances on image understanding tasks recently achieved by deep learning models. In this paper, we introduce FreeLabel, an intuitive open-source web interface that allows users to obtain high-quality segmentation masks with just a few freehand scribbles, in a matter of seconds. The efficacy of FreeLabel is quantitatively demonstrated by experimental results on the PASCAL dataset as well as on a dataset from the agricultural domain. Designed to benefit the computer vision community, FreeLabel can be used for both crowdsourced or private annotation and has a modular structure that can be easily adapted for any image dataset.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125361209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}