2019 IEEE Winter Conference on Applications of Computer Vision (WACV)最新文献

英文中文

Improving Robustness of Random Forest Under Label Noise 改进标签噪声下随机森林的鲁棒性

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00106

Xu Zhou, Pak Lun Kevin Ding, Baoxin Li

Random forest is a well-known and widely-used machine learning model. In many applications where the training data arise from real-world sources, there may be labeling errors in the data. In spite of its superior performance, the basic model of random forest dose not consider potential label noise in learning, and thus its performance can suffer significantly in the presence of label noise. In order to solve this problem, we present a new variation of random forest - a novel learning approach that leads to an improved noise robust random forest (NRRF) model. We incorporate the noise information by introducing a global multi-class noise tolerant loss function into the training of the classic random forest model. This new loss function was found to significantly boost the performance of random forest. We evaluated the proposed NRRF by extensive experiments of classification tasks on standard machine learning/computer vision datasets like MNIST, letter and Cifar10. The proposed NRRF produced very promising results under a wide range of noise settings.

随机森林是一种众所周知且被广泛使用的机器学习模型。在许多训练数据来自真实世界的应用程序中，数据中可能存在标记错误。随机森林的基本模型虽然性能优越，但在学习中没有考虑潜在的标签噪声，因此在存在标签噪声的情况下，其性能会受到很大影响。为了解决这一问题，我们提出了随机森林的一种新变体——一种新的学习方法，该方法导致改进的噪声鲁棒随机森林(NRRF)模型。我们通过在经典随机森林模型的训练中引入全局多类容噪损失函数来吸收噪声信息。发现这种新的损失函数可以显著提高随机森林的性能。我们通过在标准机器学习/计算机视觉数据集(如MNIST, letter和Cifar10)上进行分类任务的大量实验来评估所提出的NRRF。拟议的NRRF在广泛的噪声设置下产生了非常有希望的结果。

引用次数: 3

Low-Shot Learning From Imaginary 3D Model 低镜头学习从虚构的3D模型

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00109

Frederik Pahde, M. Puscas, Jannik Wolff, T. Klein, N. Sebe, Moin Nabi

Since the advent of deep learning, neural networks have demonstrated remarkable results in many visual recognition tasks, constantly pushing the limits. However, the state-of-the-art approaches are largely unsuitable in scarce data regimes. To address this shortcoming, this paper proposes employing a 3D model, which is derived from training images. Such a model can then be used to hallucinate novel viewpoints and poses for the scarce samples of the few-shot learning scenario. A self-paced learning approach allows for the selection of a diverse set of high-quality images, which facilitates the training of a classifier. The performance of the proposed approach is showcased on the fine-grained CUB-200-2011 dataset in a few-shot setting and significantly improves our baseline accuracy.

自深度学习出现以来，神经网络在许多视觉识别任务中表现出了显著的效果，不断突破极限。然而，最先进的方法在数据稀缺的情况下基本上不适合。为了解决这一问题，本文提出采用基于训练图像的三维模型。这样的模型可以用来为少数镜头学习场景的稀缺样本产生新的观点和姿势。自定节奏的学习方法允许选择一组不同的高质量图像，这有助于分类器的训练。该方法的性能在细粒度的CUB-200-2011数据集上得到了验证，并显著提高了我们的基线精度。

引用次数: 9

Which Body Is Mine? 哪个身体是我的?

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00093

M. R. Sayed, T. Sim, Joo-Hwee Lim, K. Ma

In the light of the human studies that report a strong correlation between head circumference and body size, we propose a new research problem: head-body matching. Given an image of a person's head, we want to match it with his body (headless) image. We propose a dual-pathway framework which computes head and body discriminating features independently, and learns the correlation between such features. We introduce a comprehensive evaluation of our proposed framework for this problem using different features including anthropometric features and deep-CNN features, different experimental setting such as head-body scale variations, and different body parts. We demonstrate the usefulness of our framework with two novel applications: head/body recognition, and T-shirt sizing from a head image. Our evaluations for head/body recognition application on the challenging large scale PIPA dataset (contains high variations of pose, viewpoint, and occlusion) show up to 53% of performance improvement using deep-CNN features, over the global model features in which head and body features are not separated or correlated. For T-shirt sizing application, we use anthropometric features for head-body matching. We achieve promising experimental results on small and challenging datasets.

鉴于人类研究报告头围和身体尺寸之间存在很强的相关性，我们提出了一个新的研究问题:头身匹配。给定一个人的头部图像，我们希望将其与他的身体(无头)图像相匹配。我们提出了一个双路径框架，该框架独立计算头部和身体的识别特征，并学习这些特征之间的相关性。我们使用不同的特征(包括人体测量特征和深度cnn特征)、不同的实验设置(如头身尺度变化)和不同的身体部位，对我们提出的框架进行了全面的评估。我们通过两个新的应用来证明我们的框架的实用性:头部/身体识别，以及从头部图像确定t恤尺寸。我们对具有挑战性的大规模PIPA数据集(包含姿势、视点和遮挡的高度变化)上的头/身体识别应用程序的评估显示，与头部和身体特征不分离或不相关的全局模型特征相比，使用深度cnn特征的性能提高了53%。对于t恤尺寸的应用，我们使用人体测量特征进行头身匹配。我们在小型和具有挑战性的数据集上取得了有希望的实验结果。

{"title":"Which Body Is Mine?","authors":"M. R. Sayed, T. Sim, Joo-Hwee Lim, K. Ma","doi":"10.1109/WACV.2019.00093","DOIUrl":"https://doi.org/10.1109/WACV.2019.00093","url":null,"abstract":"In the light of the human studies that report a strong correlation between head circumference and body size, we propose a new research problem: head-body matching. Given an image of a person's head, we want to match it with his body (headless) image. We propose a dual-pathway framework which computes head and body discriminating features independently, and learns the correlation between such features. We introduce a comprehensive evaluation of our proposed framework for this problem using different features including anthropometric features and deep-CNN features, different experimental setting such as head-body scale variations, and different body parts. We demonstrate the usefulness of our framework with two novel applications: head/body recognition, and T-shirt sizing from a head image. Our evaluations for head/body recognition application on the challenging large scale PIPA dataset (contains high variations of pose, viewpoint, and occlusion) show up to 53% of performance improvement using deep-CNN features, over the global model features in which head and body features are not separated or correlated. For T-shirt sizing application, we use anthropometric features for head-body matching. We achieve promising experimental results on small and challenging datasets.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134132407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Active Learning with n-ary Queries for Image Recognition 基于n元查询的主动学习图像识别

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00090

Aditya R. Bhattacharya, Shayok Chakraborty

Active learning algorithms automatically identify the salient and informative samples from large amounts of unlabeled data and tremendously reduce human annotation effort in inducing a machine learning model. In a multi-class classification problem, however, the human oracle has to provide the precise category label of each unlabeled sample to be annotated. In an application with a significantly large (and possibly unknown) number of classes (such as object recognition), providing the exact class label may be time consuming and error prone. In this paper, we propose a novel active learning framework where the annotator merely needs to identify which of the selected n categories a given unlabeled sample belongs to (where n is much smaller than the actual number of classes). We pose the active sample selection as an NP-hard integer quadratic programming problem and exploit the Iterative Truncated Power algorithm to derive an efficient solution. To the best of our knowledge, this is the first research effort to propose a generic n-ary query framework for active sample selection. Our extensive empirical results on six challenging vision datasets (from four different application domains and varied number of classes ranging from 10 to 369) corroborate the potential of the framework in further reducing human annotation effort in real-world active learning applications.

主动学习算法自动从大量未标记的数据中识别显著和信息丰富的样本，并极大地减少了人工注释在诱导机器学习模型中的工作量。然而，在多类分类问题中，人类预言器必须提供每个未标记样本的精确类别标签。在具有大量(可能未知)类(例如对象识别)的应用程序中，提供确切的类标签可能非常耗时且容易出错。在本文中，我们提出了一种新的主动学习框架，其中注释者只需要识别给定的未标记样本属于选定的n个类别中的哪一个(其中n远远小于实际类别的数量)。我们将主动样本选择作为一个NP-hard整数二次规划问题，并利用迭代截断幂算法推导出一个有效的解。据我们所知，这是第一个为主动样本选择提出通用n元查询框架的研究。我们在六个具有挑战性的视觉数据集(来自四个不同的应用领域和不同数量的类别，从10到369不等)上的广泛实证结果证实了该框架在进一步减少现实世界主动学习应用中人类注释工作量方面的潜力。

{"title":"Active Learning with n-ary Queries for Image Recognition","authors":"Aditya R. Bhattacharya, Shayok Chakraborty","doi":"10.1109/WACV.2019.00090","DOIUrl":"https://doi.org/10.1109/WACV.2019.00090","url":null,"abstract":"Active learning algorithms automatically identify the salient and informative samples from large amounts of unlabeled data and tremendously reduce human annotation effort in inducing a machine learning model. In a multi-class classification problem, however, the human oracle has to provide the precise category label of each unlabeled sample to be annotated. In an application with a significantly large (and possibly unknown) number of classes (such as object recognition), providing the exact class label may be time consuming and error prone. In this paper, we propose a novel active learning framework where the annotator merely needs to identify which of the selected n categories a given unlabeled sample belongs to (where n is much smaller than the actual number of classes). We pose the active sample selection as an NP-hard integer quadratic programming problem and exploit the Iterative Truncated Power algorithm to derive an efficient solution. To the best of our knowledge, this is the first research effort to propose a generic n-ary query framework for active sample selection. Our extensive empirical results on six challenging vision datasets (from four different application domains and varied number of classes ranging from 10 to 369) corroborate the potential of the framework in further reducing human annotation effort in real-world active learning applications.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133652925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Zero-Shot Learning Via Recurrent Knowledge Transfer 通过循环知识转移的零概率学习

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00144

Bo Zhao, Xinwei Sun, Xiaopeng Hong, Y. Yao, Yizhou Wang

Zero-shot learning (ZSL) which aims to learn new concepts without any labeled training data is a promising solution to large-scale concept learning. Recently, many works implement zero-shot learning by transferring structural knowledge from the semantic embedding space to the image feature space. However, we observe that such direct knowledge transfer may suffer from the space shift problem in the form of the inconsistency of geometric structures in the training and testing spaces. To alleviate this problem, we propose a novel method which actualizes recurrent knowledge transfer (RecKT) between the two spaces. Specifically, we unite the two spaces into the joint embedding space in which unseen image data are missing. The proposed method provides a synthesis-refinement mechanism to learn the shared subspace structure (SSS) and synthesize missing data simultaneously in the joint embedding space. The synthesized unseen image data are utilized to construct the classifier for unseen classes. Experimental results show that our method outperforms the state-of-the-art on three popular datasets. The ablation experiment and visualization of the learning process illustrate how our method can alleviate the space shift problem. By product, our method provides a perspective to interpret the ZSL performance by implementing subspace clustering on the learned SSS.

零射击学习(Zero-shot learning, ZSL)是一种很有前途的大规模概念学习解决方案，其目的是在没有任何标记训练数据的情况下学习新概念。最近，许多研究通过将结构知识从语义嵌入空间转移到图像特征空间来实现零学习。然而，我们观察到这种直接的知识转移可能会受到空间转移问题的影响，其形式是训练空间和测试空间的几何结构不一致。为了解决这一问题，我们提出了一种新的方法，即在两个空间之间实现循环知识转移(rect)。具体地说，我们将这两个空间合并成一个联合嵌入空间，在这个空间中缺失了未见过的图像数据。该方法提供了一种综合-改进机制，在联合嵌入空间中学习共享子空间结构(SSS)并同时合成缺失数据。利用合成的未见图像数据构建未见分类器。实验结果表明，我们的方法在三个流行的数据集上优于最先进的方法。消融实验和可视化的学习过程说明了我们的方法是如何缓解空间移位问题的。最终，我们的方法通过在学习到的SSS上实现子空间聚类，为解释ZSL性能提供了一个视角。

{"title":"Zero-Shot Learning Via Recurrent Knowledge Transfer","authors":"Bo Zhao, Xinwei Sun, Xiaopeng Hong, Y. Yao, Yizhou Wang","doi":"10.1109/WACV.2019.00144","DOIUrl":"https://doi.org/10.1109/WACV.2019.00144","url":null,"abstract":"Zero-shot learning (ZSL) which aims to learn new concepts without any labeled training data is a promising solution to large-scale concept learning. Recently, many works implement zero-shot learning by transferring structural knowledge from the semantic embedding space to the image feature space. However, we observe that such direct knowledge transfer may suffer from the space shift problem in the form of the inconsistency of geometric structures in the training and testing spaces. To alleviate this problem, we propose a novel method which actualizes recurrent knowledge transfer (RecKT) between the two spaces. Specifically, we unite the two spaces into the joint embedding space in which unseen image data are missing. The proposed method provides a synthesis-refinement mechanism to learn the shared subspace structure (SSS) and synthesize missing data simultaneously in the joint embedding space. The synthesized unseen image data are utilized to construct the classifier for unseen classes. Experimental results show that our method outperforms the state-of-the-art on three popular datasets. The ablation experiment and visualization of the learning process illustrate how our method can alleviate the space shift problem. By product, our method provides a perspective to interpret the ZSL performance by implementing subspace clustering on the learned SSS.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115069616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Improving Image Captioning by Leveraging Knowledge Graphs 利用知识图改进图像字幕

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00036

Yimin Zhou, Yiwei Sun, Vasant G Honavar

We explore the use of a knowledge graphs, that capture general or commonsense knowledge, to augment the information extracted from images by the state-of-the-art methods for image captioning. We compare the performance of image captioning systems that as measured by CIDEr-D, a performance measure that is explicitly designed for evaluating image captioning systems, on several benchmark data sets such as MS COCO. The results of our experiments show that the variants of the state-of-the-art methods for image captioning that make use of the information extracted from knowledge graphs can substantially outperform those that rely solely on the information extracted from images.

我们探索了知识图的使用，它捕获一般或常识性知识，通过最先进的图像字幕方法来增强从图像中提取的信息。我们比较了CIDEr-D测量的图像字幕系统的性能，CIDEr-D是一种明确设计用于评估图像字幕系统的性能指标，在几个基准数据集(如MS COCO)上。我们的实验结果表明，使用从知识图中提取的信息的最先进的图像字幕方法的变体可以大大优于仅依赖从图像中提取的信息的方法。

引用次数: 41

Rapid Technique to Eliminate Moving Shadows for Accurate Vehicle Detection 快速消除运动阴影技术用于车辆的准确检测

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00214

Kratika Garg, N. Ramakrishnan, Alok Prakash, T. Srikanthan, Punit Bhatt

Elimination of moving shadows is an essential step to achieve accurate vehicle detection and localization in automated traffic surveillance systems that aim to detect vehicles on road scenes captured by surveillance cameras. However, this is still a challenging problem as existing pixel based methods miss parts of vehicles and region-based methods, while accurate, incur higher computations. In this paper, we propose a highly accurate yet low-complexity block-based moving shadow elimination technique, which can effectively deal with varying shadow conditions. A novel shadow elimination pipeline is proposed that employs computationally lean features to quickly classify distinct vehicles from shadows, and uses a more sophisticated interior edge feature only for classification of difficult scenarios. Extensive evaluations on freely available and self-collected datasets demonstrate that the proposed technique achieves higher accuracy than other state-of-the-art techniques in varying scenarios. Additionally, it also achieves over 20 times speedup on a low-cost embedded platform, Odroid XU-4, over a state-of-the-art technique that achieves comparable accuracy. Experimental results confirm the realtime capability of the proposed approach while achieving robustness to varying shadow scenarios.

在自动交通监控系统中，消除移动阴影是实现准确的车辆检测和定位的重要步骤，自动交通监控系统旨在检测监控摄像机捕获的道路场景中的车辆。然而，这仍然是一个具有挑战性的问题，因为现有的基于像素的方法遗漏了车辆的某些部分，而基于区域的方法虽然准确，但需要更高的计算量。本文提出了一种高精度、低复杂度的基于块的运动阴影消除技术，该技术可以有效地处理变化的阴影条件。提出了一种新的阴影消除管道，该管道采用计算精益的特征来快速从阴影中分类不同的车辆，并使用更复杂的内边缘特征仅用于困难场景的分类。对可免费获得和自行收集的数据集进行的广泛评估表明，在不同的情况下，所提出的技术比其他最先进的技术具有更高的准确性。此外，它还在低成本嵌入式平台Odroid XU-4上实现了超过20倍的加速，比最先进的技术实现了相当的精度。实验结果证实了该方法的实时性，同时实现了对不同阴影场景的鲁棒性。

{"title":"Rapid Technique to Eliminate Moving Shadows for Accurate Vehicle Detection","authors":"Kratika Garg, N. Ramakrishnan, Alok Prakash, T. Srikanthan, Punit Bhatt","doi":"10.1109/WACV.2019.00214","DOIUrl":"https://doi.org/10.1109/WACV.2019.00214","url":null,"abstract":"Elimination of moving shadows is an essential step to achieve accurate vehicle detection and localization in automated traffic surveillance systems that aim to detect vehicles on road scenes captured by surveillance cameras. However, this is still a challenging problem as existing pixel based methods miss parts of vehicles and region-based methods, while accurate, incur higher computations. In this paper, we propose a highly accurate yet low-complexity block-based moving shadow elimination technique, which can effectively deal with varying shadow conditions. A novel shadow elimination pipeline is proposed that employs computationally lean features to quickly classify distinct vehicles from shadows, and uses a more sophisticated interior edge feature only for classification of difficult scenarios. Extensive evaluations on freely available and self-collected datasets demonstrate that the proposed technique achieves higher accuracy than other state-of-the-art techniques in varying scenarios. Additionally, it also achieves over 20 times speedup on a low-cost embedded platform, Odroid XU-4, over a state-of-the-art technique that achieves comparable accuracy. Experimental results confirm the realtime capability of the proposed approach while achieving robustness to varying shadow scenarios.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128993795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

LUCFER: A Large-Scale Context-Sensitive Image Dataset for Deep Learning of Visual Emotions 用于视觉情绪深度学习的大规模上下文敏感图像数据集

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00180

Pooyan Balouchian, M. Safaei, H. Foroosh

Still image emotion recognition has been receiving increasing attention in recent years due to the tremendous amount of social media content available on the Web. Opinion mining, visual emotion analysis, search and retrieval are among the application areas, to name a few. While there exist works on the subject, offering methods to detect image sentiment; i.e. recognizing the polarity of the image, less efforts focus on emotion analysis; i.e. dealing with recognizing the exact emotion aroused when exposed to certain visual stimuli. Main gaps tackled in this work include (1) lack of large-scale image datasets for deep learning of visual emotions and (2) lack of context-sensitive single-modality approaches in emotion analysis in the still image domain. In this paper, we introduce LUCFER (Pronounced LU-CI-FER), a dataset containing over 3.6M images, with 3-dimensional labels; i.e. emotion, context and valence. LUCFER, the largest dataset of the kind currently available, is collected using a novel data collection pipeline, proposed and implemented in this work. Moreover, we train a context-sensitive deep classifier using a novel multinomial classification technique proposed here via adding a dimensionality reduction layer to the CNN. Relying on our categorical approach to emotion recognition, we claim and show empirically that injecting context to our unified training process helps (1) achieve a more balanced precision and recall, and (2) boost performance, yielding an overall classification accuracy of 73.12% compared to 58.3% achieved in the closest work in the literature.

近年来，由于网络上有大量的社交媒体内容，静止图像情感识别受到越来越多的关注。意见挖掘，视觉情感分析，搜索和检索是其中的应用领域，仅举几例。虽然有关于这一主题的作品，提供了检测图像情感的方法;即识别图像的极性，较少关注情感分析;即，处理识别暴露于特定视觉刺激时所引起的确切情绪。在这项工作中解决的主要差距包括:(1)缺乏用于视觉情感深度学习的大规模图像数据集;(2)在静态图像领域的情感分析中缺乏上下文敏感的单模态方法。在本文中，我们引入LUCFER(发音为LU-CI-FER)，这是一个包含超过360万张图像的数据集，具有三维标签;即情感，语境和效价。LUCFER是目前可用的最大的数据集，使用一种新的数据收集管道收集，在这项工作中提出并实施。此外，我们使用本文提出的一种新的多项分类技术，通过在CNN上添加降维层来训练上下文敏感的深度分类器。依靠我们的分类方法来进行情感识别，我们声称并通过经验证明，在我们的统一训练过程中注入上下文有助于(1)实现更平衡的精度和召回率，(2)提高性能，产生73.12%的总体分类准确率，而文献中最接近的工作达到58.3%。

{"title":"LUCFER: A Large-Scale Context-Sensitive Image Dataset for Deep Learning of Visual Emotions","authors":"Pooyan Balouchian, M. Safaei, H. Foroosh","doi":"10.1109/WACV.2019.00180","DOIUrl":"https://doi.org/10.1109/WACV.2019.00180","url":null,"abstract":"Still image emotion recognition has been receiving increasing attention in recent years due to the tremendous amount of social media content available on the Web. Opinion mining, visual emotion analysis, search and retrieval are among the application areas, to name a few. While there exist works on the subject, offering methods to detect image sentiment; i.e. recognizing the polarity of the image, less efforts focus on emotion analysis; i.e. dealing with recognizing the exact emotion aroused when exposed to certain visual stimuli. Main gaps tackled in this work include (1) lack of large-scale image datasets for deep learning of visual emotions and (2) lack of context-sensitive single-modality approaches in emotion analysis in the still image domain. In this paper, we introduce LUCFER (Pronounced LU-CI-FER), a dataset containing over 3.6M images, with 3-dimensional labels; i.e. emotion, context and valence. LUCFER, the largest dataset of the kind currently available, is collected using a novel data collection pipeline, proposed and implemented in this work. Moreover, we train a context-sensitive deep classifier using a novel multinomial classification technique proposed here via adding a dimensionality reduction layer to the CNN. Relying on our categorical approach to emotion recognition, we claim and show empirically that injecting context to our unified training process helps (1) achieve a more balanced precision and recall, and (2) boost performance, yielding an overall classification accuracy of 73.12% compared to 58.3% achieved in the closest work in the literature.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121312296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Multi-Modality Empowered Network for Facial Action Unit Detection 面部动作单元检测的多模态授权网络

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00235

Peng Liu, Zheng Zhang, Huiyuan Yang, L. Yin

This paper presents a new thermal empowered multi-task network (TEMT-Net) to improve facial action unit detection. Our primary goal is to leverage the situation that the training set has multi-modality data while the application scenario only has one modality. Thermal images are robust to illumination and face color. In the proposed multi-task framework, we utilize both modality data. Action unit detection and facial landmark detection are correlated tasks. To utilize the advantage and the correlation of different modalities and different tasks, we propose a novel thermal empowered multi-task deep neural network learning approach for action unit detection, facial landmark detection and thermal image reconstruction simultaneously. The thermal image generator and facial landmark detection provide regularization on the learned features with shared factors as the input color images. Extensive experiments are conducted on the BP4D and MMSE databases, with the comparison to the state of the art methods. The experiments show that the multi-modality framework improves the AU detection significantly.

本文提出了一种新的热授权多任务网络(TEMT-Net)来改进面部动作单元的检测。我们的主要目标是利用训练集具有多模态数据而应用场景只有一种模态的情况。热图像对光照和人脸颜色具有较强的鲁棒性。在提出的多任务框架中，我们利用了两种模态数据。动作单元检测和人脸标记检测是相互关联的。为了利用不同模式和不同任务之间的优势和相关性，我们提出了一种新的热授权多任务深度神经网络学习方法，用于动作单元检测、面部地标检测和热图像重建。热图像发生器和人脸地标检测对学习到的特征进行正则化，并将共享因子作为输入的彩色图像。在BP4D和MMSE数据库上进行了广泛的实验，并与最先进的方法进行了比较。实验表明，该多模态框架显著提高了非目标探测能力。

引用次数: 13

FreeLabel: A Publicly Available Annotation Tool Based on Freehand Traces FreeLabel:一个基于手绘痕迹的公开标注工具

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00010

P. Dias, Zhou Shen, A. Tabb, Henry Medeiros

Large-scale annotation of image segmentation datasets is often prohibitively expensive, as it usually requires a huge number of worker hours to obtain high-quality results. Abundant and reliable data has been, however, crucial for the advances on image understanding tasks recently achieved by deep learning models. In this paper, we introduce FreeLabel, an intuitive open-source web interface that allows users to obtain high-quality segmentation masks with just a few freehand scribbles, in a matter of seconds. The efficacy of FreeLabel is quantitatively demonstrated by experimental results on the PASCAL dataset as well as on a dataset from the agricultural domain. Designed to benefit the computer vision community, FreeLabel can be used for both crowdsourced or private annotation and has a modular structure that can be easily adapted for any image dataset.

对图像分割数据集进行大规模标注往往代价高昂，因为它通常需要大量的工作时间才能获得高质量的结果。然而，丰富可靠的数据对于深度学习模型最近在图像理解任务上取得的进展至关重要。在本文中，我们介绍了FreeLabel，这是一个直观的开源web界面，允许用户在几秒钟内通过徒手涂鸦获得高质量的分割掩码。在PASCAL数据集和农业领域数据集上的实验结果定量地证明了FreeLabel的有效性。为了使计算机视觉社区受益，FreeLabel可以用于众包或私有注释，并且具有模块化结构，可以很容易地适应任何图像数据集。

引用次数: 12

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀