首页 > 最新文献

Proceedings of the 26th ACM international conference on Multimedia最新文献

英文 中文
Session details: FF-3 会话详情:FF-3
Pub Date : 2018-10-15 DOI: 10.1145/3286925
Zhu Li
{"title":"Session details: FF-3","authors":"Zhu Li","doi":"10.1145/3286925","DOIUrl":"https://doi.org/10.1145/3286925","url":null,"abstract":"","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122035688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive Video Search: Where is the User in the Age of Deep Learning?
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3241473
Klaus Schöffmann, W. Bailer, C. Gurrin, G. Awad, Jakub Lokoč
In this tutorial we discuss interactive video search tools and methods, review their need in the age of deep learning, and explore video and multimedia search challenges and their role as evaluation benchmarks in the field of multimedia information retrieval. We cover three different campaigns (TRECVID, Video Browser Showdown, and the Lifelog Search Challenge), discuss their goals and rules, and present their achieved findings over the last half-decade. Moreover, we talk about datasets, tasks, evaluation procedures, and examples of interactive video search tools, as well as how they evolved over the years. Participants of this tutorial will be able to gain collective insights from all three challenges and use them for focusing their research efforts on outstanding problems that still remain unsolved in this area.
我们涵盖了三个不同的活动(TRECVID,视频浏览器对决和生活日志搜索挑战),讨论了他们的目标和规则,并展示了他们在过去五年中取得的成果。此外,我们还讨论了数据集、任务、评估程序和交互式视频搜索工具的示例,以及它们多年来的发展情况。本教程的参与者将能够从这三个挑战中获得集体见解,并利用它们将研究工作集中在该领域尚未解决的突出问题上。
{"title":"Interactive Video Search: Where is the User in the Age of Deep Learning?","authors":"Klaus Schöffmann, W. Bailer, C. Gurrin, G. Awad, Jakub Lokoč","doi":"10.1145/3240508.3241473","DOIUrl":"https://doi.org/10.1145/3240508.3241473","url":null,"abstract":"In this tutorial we discuss interactive video search tools and methods, review their need in the age of deep learning, and explore video and multimedia search challenges and their role as evaluation benchmarks in the field of multimedia information retrieval. We cover three different campaigns (TRECVID, Video Browser Showdown, and the Lifelog Search Challenge), discuss their goals and rules, and present their achieved findings over the last half-decade. Moreover, we talk about datasets, tasks, evaluation procedures, and examples of interactive video search tools, as well as how they evolved over the years. Participants of this tutorial will be able to gain collective insights from all three challenges and use them for focusing their research efforts on outstanding problems that still remain unsolved in this area.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123921380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Generative Adversarial Product Quantisation 生成对抗积量化
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240590
Litao Yu, Yongsheng Gao, J. Zhou
Product Quantisation (PQ) has been recognised as an effective encoding technique for scalable multimedia content analysis. In this paper, we propose a novel learning framework that enables an end-to-end encoding strategy from raw images to compact PQ codes. The system aims to learn both PQ encoding functions and codewords for content-based image retrieval. In detail, we first design a trainable encoding layer that is pluggable into neural networks, so the codewords can be trained in back-forward propagation. Then we integrate it into a Deep Convolutional Generative Adversarial Network (DC-GAN). In our proposed encoding framework, the raw images are directly encoded by passing through the convolutional and encoding layers, and the generator aims to use the codewords as constrained inputs to generate full image representations that are visually similar to the original images. By taking the advantages of the generative adversarial model, our proposed system can produce high-quality PQ codewords and encoding functions for scalable multimedia retrieval tasks. Experiments show that the proposed architecture GA-PQ outperforms the state-of-the-art encoding techniques on three public image datasets.
产品量化(PQ)已被认为是一种有效的编码技术,用于可扩展的多媒体内容分析。在本文中,我们提出了一种新的学习框架,该框架支持从原始图像到紧凑PQ代码的端到端编码策略。该系统旨在学习基于内容的图像检索的PQ编码函数和码字。具体来说,我们首先设计了一个可训练的编码层,该编码层可插入到神经网络中,因此码字可以在反向传播中进行训练。然后将其集成到深度卷积生成对抗网络(DC-GAN)中。在我们提出的编码框架中,原始图像通过卷积层和编码层直接编码,生成器旨在使用码字作为约束输入来生成视觉上与原始图像相似的完整图像表示。利用生成对抗模型的优势,我们提出的系统可以为可扩展的多媒体检索任务生成高质量的PQ码字和编码函数。实验表明,本文提出的GA-PQ结构在三个公共图像数据集上的编码性能优于目前最先进的编码技术。
{"title":"Generative Adversarial Product Quantisation","authors":"Litao Yu, Yongsheng Gao, J. Zhou","doi":"10.1145/3240508.3240590","DOIUrl":"https://doi.org/10.1145/3240508.3240590","url":null,"abstract":"Product Quantisation (PQ) has been recognised as an effective encoding technique for scalable multimedia content analysis. In this paper, we propose a novel learning framework that enables an end-to-end encoding strategy from raw images to compact PQ codes. The system aims to learn both PQ encoding functions and codewords for content-based image retrieval. In detail, we first design a trainable encoding layer that is pluggable into neural networks, so the codewords can be trained in back-forward propagation. Then we integrate it into a Deep Convolutional Generative Adversarial Network (DC-GAN). In our proposed encoding framework, the raw images are directly encoded by passing through the convolutional and encoding layers, and the generator aims to use the codewords as constrained inputs to generate full image representations that are visually similar to the original images. By taking the advantages of the generative adversarial model, our proposed system can produce high-quality PQ codewords and encoding functions for scalable multimedia retrieval tasks. Experiments show that the proposed architecture GA-PQ outperforms the state-of-the-art encoding techniques on three public image datasets.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127255085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Shadow Calligraphy of Dance: An Image-Based Interactive Installation for Capturing Flowing Human Figures 舞蹈的影子书法:一个基于图像的互动装置,捕捉流动的人物形象
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3264576
Lyn Chao-ling Chen, He-Lin Luo
In the artwork, the topic of flowing human figures has been discussed. People pass through familiar places day by day, in which they create connection among them and the city. The impressions, memories and experiences turn the definition of the space in the city into place, and it is meaningful and creates a virtual layer upon the physical world. The artwork tried to arouse people to aware the connection among them and the environment by revealing the invisible traces. The interactive installation was set in outdoor exhibition, and a camera was set align the road and a projector was used for performing image on the wall of the nearby building. Object detection technology has been used in the interactive installation for capturing movements of people. GMM modeling was adopted for capturing frames with vivid face features, and the parameters was set for generating afterimage effect. The projected picture on the wall combined with 25 frames in different update time setting for performing a delayed vision, and only one region in the center of the image played the current frame in real-time, for arousing audience to notice the connection between their movements and the projected picture. In addition, some of them were reversed in horizontal direction for creating a dynamic Chinese brush painting with aesthetic composition. The remaining figures on the wall as mark or print remind people their traces in the city, and that creates the connection among the city and people who has been to the place at the same time. In the interactive installation, the improvisational painting of body calligraphy was exhibited in a collaborative way, in which revealed the face features or human shapes of the crowd in physical point, and also the collaborative experiences or memories in mental aspect.
在作品中,讨论了流动的人物形象的主题。人们每天经过熟悉的地方,在其中创造了他们与城市之间的联系。印象、记忆和体验将城市空间的定义转化为合适的地方,它是有意义的,并在物理世界上创造了一个虚拟的层次。作品试图通过揭示看不见的痕迹,唤起人们意识到人与环境之间的联系。互动装置设置在户外展览中,摄像机对准道路,投影仪在附近建筑的墙上表演图像。在交互式装置中使用了物体检测技术来捕捉人的动作。采用GMM建模捕获具有逼真人脸特征的帧,并设置参数生成后像效果。墙上的投影画面结合了25帧不同更新时间设置的画面进行延迟视觉,画面中央只有一个区域实时播放当前画面,唤起观众注意到自己的动作与投影画面之间的联系。此外,其中一些在水平方向上被反转,以创造一种具有美学构图的动态中国毛笔画。墙上留下的人物作为标记或印刷品提醒人们他们在城市中的痕迹,从而在城市和同时去过这个地方的人之间建立联系。在互动装置中,身体书法的即兴绘画以协作的方式展示,在物理点上展现人群的面相或人形,在心理方面也展现协作的经验或记忆。
{"title":"Shadow Calligraphy of Dance: An Image-Based Interactive Installation for Capturing Flowing Human Figures","authors":"Lyn Chao-ling Chen, He-Lin Luo","doi":"10.1145/3240508.3264576","DOIUrl":"https://doi.org/10.1145/3240508.3264576","url":null,"abstract":"In the artwork, the topic of flowing human figures has been discussed. People pass through familiar places day by day, in which they create connection among them and the city. The impressions, memories and experiences turn the definition of the space in the city into place, and it is meaningful and creates a virtual layer upon the physical world. The artwork tried to arouse people to aware the connection among them and the environment by revealing the invisible traces. The interactive installation was set in outdoor exhibition, and a camera was set align the road and a projector was used for performing image on the wall of the nearby building. Object detection technology has been used in the interactive installation for capturing movements of people. GMM modeling was adopted for capturing frames with vivid face features, and the parameters was set for generating afterimage effect. The projected picture on the wall combined with 25 frames in different update time setting for performing a delayed vision, and only one region in the center of the image played the current frame in real-time, for arousing audience to notice the connection between their movements and the projected picture. In addition, some of them were reversed in horizontal direction for creating a dynamic Chinese brush painting with aesthetic composition. The remaining figures on the wall as mark or print remind people their traces in the city, and that creates the connection among the city and people who has been to the place at the same time. In the interactive installation, the improvisational painting of body calligraphy was exhibited in a collaborative way, in which revealed the face features or human shapes of the crowd in physical point, and also the collaborative experiences or memories in mental aspect.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129928933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
High-Quality Exposure Correction of Underexposed Photos 曝光不足照片的高质量曝光校正
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240595
Qing Zhang, Ganzhao Yuan, Chunxia Xiao, Lei Zhu, Weishi Zheng
We address the problem of correcting the exposure of underexposed photos. Previous methods have tackled this problem from many different perspectives and achieved remarkable progress. However, they usually fail to produce natural-looking results due to the existence of visual artifacts such as color distortion, loss of detail, exposure inconsistency, etc. We find that the main reason why existing methods induce these artifacts is because they break a perceptually similarity between the input and output. Based on this observation, an effective criterion, termed as perceptually bidirectional similarity (PBS) is proposed. Based on this criterion and the Retinex theory, we cast the exposure correction problem as an illumination estimation optimization, where PBS is defined as three constraints for estimating illumination that can generate the desired result with even exposure, vivid color and clear textures. Qualitative and quantitative comparisons, and the user study demonstrate the superiority of our method over the state-of-the-art methods.
我们解决了曝光不足照片的曝光纠正问题。以前的方法从许多不同的角度解决了这个问题,并取得了显著的进展。然而,由于存在视觉伪影,如色彩失真、细节丢失、曝光不一致等,它们通常不能产生自然的效果。我们发现,现有方法导致这些伪影的主要原因是它们打破了输入和输出之间的感知相似性。基于这一观察,提出了一种有效的标准,称为感知双向相似性(PBS)。基于这一准则和Retinex理论,我们将曝光校正问题转化为光照估计优化问题,其中PBS被定义为估算光照的三个约束条件,这些约束条件能够产生所需的光照均匀、色彩鲜艳和纹理清晰的结果。定性和定量比较,以及用户研究证明了我们的方法优于最先进的方法。
{"title":"High-Quality Exposure Correction of Underexposed Photos","authors":"Qing Zhang, Ganzhao Yuan, Chunxia Xiao, Lei Zhu, Weishi Zheng","doi":"10.1145/3240508.3240595","DOIUrl":"https://doi.org/10.1145/3240508.3240595","url":null,"abstract":"We address the problem of correcting the exposure of underexposed photos. Previous methods have tackled this problem from many different perspectives and achieved remarkable progress. However, they usually fail to produce natural-looking results due to the existence of visual artifacts such as color distortion, loss of detail, exposure inconsistency, etc. We find that the main reason why existing methods induce these artifacts is because they break a perceptually similarity between the input and output. Based on this observation, an effective criterion, termed as perceptually bidirectional similarity (PBS) is proposed. Based on this criterion and the Retinex theory, we cast the exposure correction problem as an illumination estimation optimization, where PBS is defined as three constraints for estimating illumination that can generate the desired result with even exposure, vivid color and clear textures. Qualitative and quantitative comparisons, and the user study demonstrate the superiority of our method over the state-of-the-art methods.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130063777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
Demonstration of an Open Source Framework for Qualitative Evaluation of CBIR Systems cir系统定性评估的开源框架演示
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3241395
Paula Gómez Duran, Eva Mohedano, Kevin McGuinness, Xavier Giro-i-Nieto, N. O’Connor
Evaluating image retrieval systems in a quantitative way, for example by computing measures like mean average precision, allows for objective comparisons with a ground-truth. However, in cases where ground-truth is not available, the only alternative is to collect feedback from a user. Thus, qualitative assessments become important to better understand how the system works. Visualizing the results could be, in some scenarios, the only way to evaluate the results obtained and also the only opportunity to identify that a system is failing. This necessitates developing a User Interface (UI) for a Content Based Image Retrieval (CBIR) system that allows visualization of results and improvement via capturing user relevance feedback. A well-designed UI facilitates understanding of the performance of the system, both in cases where it works well and perhaps more importantly those which highlight the need for improvement. Our open-source system implements three components to facilitate researchers to quickly develop these capabilities for their retrieval engine. We present: a web-based user interface to visualize retrieval results and collect user annotations; a server that simplifies connection with any underlying CBIR system; and a server that manages the search engine data. The software itself is described in a separate submission to the ACM MM Open Source Software Competition.
以定量的方式评估图像检索系统,例如通过计算平均精度等度量,可以与基本事实进行客观比较。然而,在无法获得基本事实的情况下,唯一的替代方法是从用户那里收集反馈。因此,定性评估对于更好地理解系统如何工作变得非常重要。在某些情况下,可视化结果可能是评估所获得结果的唯一方法,也是识别系统失败的唯一机会。这就需要为基于内容的图像检索(CBIR)系统开发一个用户界面(UI),该系统允许通过捕获用户相关反馈来实现结果的可视化和改进。设计良好的UI有助于理解系统的性能,无论是在系统运行良好的情况下,还是在需要改进的情况下,都是如此。我们的开源系统实现了三个组件,以方便研究人员快速开发这些功能为他们的检索引擎。我们提出了一个基于web的用户界面来可视化检索结果和收集用户注释;简化与任何底层CBIR系统连接的服务器;还有一个管理搜索引擎数据的服务器。软件本身在ACM MM开源软件竞赛的单独提交中进行了描述。
{"title":"Demonstration of an Open Source Framework for Qualitative Evaluation of CBIR Systems","authors":"Paula Gómez Duran, Eva Mohedano, Kevin McGuinness, Xavier Giro-i-Nieto, N. O’Connor","doi":"10.1145/3240508.3241395","DOIUrl":"https://doi.org/10.1145/3240508.3241395","url":null,"abstract":"Evaluating image retrieval systems in a quantitative way, for example by computing measures like mean average precision, allows for objective comparisons with a ground-truth. However, in cases where ground-truth is not available, the only alternative is to collect feedback from a user. Thus, qualitative assessments become important to better understand how the system works. Visualizing the results could be, in some scenarios, the only way to evaluate the results obtained and also the only opportunity to identify that a system is failing. This necessitates developing a User Interface (UI) for a Content Based Image Retrieval (CBIR) system that allows visualization of results and improvement via capturing user relevance feedback. A well-designed UI facilitates understanding of the performance of the system, both in cases where it works well and perhaps more importantly those which highlight the need for improvement. Our open-source system implements three components to facilitate researchers to quickly develop these capabilities for their retrieval engine. We present: a web-based user interface to visualize retrieval results and collect user annotations; a server that simplifies connection with any underlying CBIR system; and a server that manages the search engine data. The software itself is described in a separate submission to the ACM MM Open Source Software Competition.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129065536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Session details: Demo + Video + Makers' Program 会议内容:演示+视频+创客活动
Pub Date : 2018-10-15 DOI: 10.1145/3286930
K. Sohn, Yong Man Ro
{"title":"Session details: Demo + Video + Makers' Program","authors":"K. Sohn, Yong Man Ro","doi":"10.1145/3286930","DOIUrl":"https://doi.org/10.1145/3286930","url":null,"abstract":"","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131028578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facial Expression Recognition in the Wild: A Cycle-Consistent Adversarial Attention Transfer Approach 野外面部表情识别:一种循环一致的对抗性注意转移方法
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240574
Feifei Zhang, Tianzhu Zhang, Qi-rong Mao, Ling-yu Duan, Changsheng Xu
Facial expression recognition (FER) is a very challenging problem due to different expressions under arbitrary poses. Most conventional approaches mainly perform FER under laboratory controlled environment. Different from existing methods, in this paper, we formulate the FER in the wild as a domain adaptation problem, and propose a novel auxiliary domain guided Cycle-consistent adversarial Attention Transfer model (CycleAT) for simultaneous facial image synthesis and facial expression recognition in the wild. The proposed model utilizes large-scale unlabeled web facial images as an auxiliary domain to reduce the gap between source domain and target domain based on generative adversarial networks (GAN) embedded with an effective attention transfer module, which enjoys several merits. First, the GAN-based method can automatically generate labeled facial images in the wild through harnessing information from labeled facial images in source domain and unlabeled web facial images in auxiliary domain. Second, the class-discriminative spatial attention maps from the classifier in source domain are leveraged to boost the performance of the classifier in target domain. Third, it can effectively preserve the structural consistency of local pixels and global attributes in the synthesized facial images through pixel cycle-consistency and discriminative loss. Quantitative and qualitative evaluations on two challenging in-the-wild datasets demonstrate that the proposed model performs favorably against state-of-the-art methods.
面部表情识别是一个非常具有挑战性的问题,因为在任意姿势下都会出现不同的表情。大多数传统的方法主要是在实验室控制的环境下进行FER。与现有方法不同的是,本文将自然环境下的注意力转移问题表述为一个领域自适应问题,并提出了一种新的辅助领域引导的循环一致对抗性注意力转移模型(CycleAT),用于同时进行自然环境下的面部图像合成和面部表情识别。该模型基于嵌入有效注意力转移模块的生成式对抗网络(GAN),利用大规模未标记的web面部图像作为辅助域来减小源域和目标域之间的差距。首先,该方法利用源域标记的人脸图像和辅助域未标记的web人脸图像的信息,在野外自动生成标记的人脸图像。其次,利用源域分类器的类区别空间注意映射来提高目标域分类器的性能。第三,通过像素循环一致性和判别损失,有效地保持了合成人脸图像中局部像素和全局属性的结构一致性。对两个具有挑战性的野外数据集的定量和定性评估表明,所提出的模型优于最先进的方法。
{"title":"Facial Expression Recognition in the Wild: A Cycle-Consistent Adversarial Attention Transfer Approach","authors":"Feifei Zhang, Tianzhu Zhang, Qi-rong Mao, Ling-yu Duan, Changsheng Xu","doi":"10.1145/3240508.3240574","DOIUrl":"https://doi.org/10.1145/3240508.3240574","url":null,"abstract":"Facial expression recognition (FER) is a very challenging problem due to different expressions under arbitrary poses. Most conventional approaches mainly perform FER under laboratory controlled environment. Different from existing methods, in this paper, we formulate the FER in the wild as a domain adaptation problem, and propose a novel auxiliary domain guided Cycle-consistent adversarial Attention Transfer model (CycleAT) for simultaneous facial image synthesis and facial expression recognition in the wild. The proposed model utilizes large-scale unlabeled web facial images as an auxiliary domain to reduce the gap between source domain and target domain based on generative adversarial networks (GAN) embedded with an effective attention transfer module, which enjoys several merits. First, the GAN-based method can automatically generate labeled facial images in the wild through harnessing information from labeled facial images in source domain and unlabeled web facial images in auxiliary domain. Second, the class-discriminative spatial attention maps from the classifier in source domain are leveraged to boost the performance of the classifier in target domain. Third, it can effectively preserve the structural consistency of local pixels and global attributes in the synthesized facial images through pixel cycle-consistency and discriminative loss. Quantitative and qualitative evaluations on two challenging in-the-wild datasets demonstrate that the proposed model performs favorably against state-of-the-art methods.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132833448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Conditional Expression Synthesis with Face Parsing Transformation 基于人脸解析变换的条件表达式合成
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240647
Zhihe Lu, Tanhao Hu, Lingxiao Song, Zhaoxiang Zhang, R. He
Facial expression synthesis with various intensities is a challenging synthesis task due to large identity appearance variations and a paucity of efficient means for intensity measurement. This paper advances the expression synthesis domain by the introduction of a Couple-Agent Face Parsing based Generative Adversarial Network (CAFP-GAN) that unites the knowledge of facial semantic regions and controllable expression signals. Specially, we employ a face parsing map as a controllable condition to guide facial texture generation with a special expression, which can provide a semantic representation of every pixel of facial regions. Our method consists of two sub-networks: face parsing prediction network (FPPN) uses controllable labels (expression and intensity) to generate a face parsing map transformation that corresponds to the labels from the input neutral face, and facial expression synthesis network (FESN) makes the pretrained FPPN as a part of it to provide the face parsing map as a guidance for expression synthesis. To enhance the reality of results, couple-agent discriminators are served to distinguish fake-real pairs in both two sub-nets. Moreover, we only need the neutral face and the labels to synthesize the unknown expression with different intensities. Experimental results on three popular facial expression databases show that our method has the compelling ability on continuous expression synthesis.
面部表情的不同强度合成是一项具有挑战性的合成任务,因为身份外观变化很大,而且缺乏有效的强度测量手段。本文通过引入一种结合面部语义区域知识和可控表情信号的基于双智能体面部解析的生成式对抗网络(CAFP-GAN)来推进表情合成领域。特别地,我们采用人脸解析图作为可控条件,用特殊的表达式来指导面部纹理的生成,可以提供面部区域的每个像素的语义表示。我们的方法由两个子网络组成:人脸解析预测网络(FPPN)使用可控标签(表情和强度)生成与输入中性人脸标签对应的人脸解析图变换;人脸表情合成网络(FESN)将预训练好的FPPN作为其一部分,提供人脸解析图作为表情合成的指导。为了提高结果的真实性,在两个子网中使用了双智能体鉴别器来区分真假对。此外,我们只需要中性面孔和标签就可以合成不同强度的未知表情。在三个常用的面部表情数据库上的实验结果表明,我们的方法具有令人信服的连续表情合成能力。
{"title":"Conditional Expression Synthesis with Face Parsing Transformation","authors":"Zhihe Lu, Tanhao Hu, Lingxiao Song, Zhaoxiang Zhang, R. He","doi":"10.1145/3240508.3240647","DOIUrl":"https://doi.org/10.1145/3240508.3240647","url":null,"abstract":"Facial expression synthesis with various intensities is a challenging synthesis task due to large identity appearance variations and a paucity of efficient means for intensity measurement. This paper advances the expression synthesis domain by the introduction of a Couple-Agent Face Parsing based Generative Adversarial Network (CAFP-GAN) that unites the knowledge of facial semantic regions and controllable expression signals. Specially, we employ a face parsing map as a controllable condition to guide facial texture generation with a special expression, which can provide a semantic representation of every pixel of facial regions. Our method consists of two sub-networks: face parsing prediction network (FPPN) uses controllable labels (expression and intensity) to generate a face parsing map transformation that corresponds to the labels from the input neutral face, and facial expression synthesis network (FESN) makes the pretrained FPPN as a part of it to provide the face parsing map as a guidance for expression synthesis. To enhance the reality of results, couple-agent discriminators are served to distinguish fake-real pairs in both two sub-nets. Moreover, we only need the neutral face and the labels to synthesize the unknown expression with different intensities. Experimental results on three popular facial expression databases show that our method has the compelling ability on continuous expression synthesis.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132882885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Post Tuned Hashing: A New Approach to Indexing High-dimensional Data 后调优哈希:一种索引高维数据的新方法
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240529
Zhendong Mao, Quan Wang, Yongdong Zhang, Bin Wang
Learning to hash has proven to be an effective solution for indexing high-dimensional data by projecting them to similarity-preserving binary codes. However, most existing methods end up the learning scheme with a binarization stage, i.e. binary quantization, which inevitably destroys the neighborhood structure of original data. As a result, those methods still suffer from great similarity loss and result in unsatisfactory indexing performance. In this paper we propose a novel hashing model, namely Post Tuned Hashing (PTH), which includes a new post-tuning stage to refine the binary codes after binarization. The post-tuning seeks to rebuild the destroyed neighborhood structure, and hence significantly improves the indexing performance. We cast the post-tuning into a binary quadratic optimization framework and, despite its NP-hardness, give a practical algorithm to efficiently obtain a high-quality solution. Experimental results on five noted image benchmarks show that our PTH improves previous state-of-the-art methods by 13-58% in mean average precision.
学习散列已被证明是一种有效的解决方案,通过将高维数据投射到保持相似性的二进制代码中来索引高维数据。然而,大多数现有方法的学习方案都是在二值化阶段结束的,即二值量化,这不可避免地破坏了原始数据的邻域结构。因此,这些方法仍然存在很大的相似度损失,导致索引性能不理想。在本文中,我们提出了一种新的哈希模型,即后调优哈希(PTH),它包含了一个新的后调优阶段来细化二值化后的二进制码。后调优旨在重建被破坏的邻域结构,从而显著提高索引性能。我们将后调优转换为二元二次优化框架,尽管它具有np -硬度,但我们给出了一个实用的算法来有效地获得高质量的解。在五个著名的图像基准上的实验结果表明,我们的PTH方法的平均精度提高了13-58%。
{"title":"Post Tuned Hashing: A New Approach to Indexing High-dimensional Data","authors":"Zhendong Mao, Quan Wang, Yongdong Zhang, Bin Wang","doi":"10.1145/3240508.3240529","DOIUrl":"https://doi.org/10.1145/3240508.3240529","url":null,"abstract":"Learning to hash has proven to be an effective solution for indexing high-dimensional data by projecting them to similarity-preserving binary codes. However, most existing methods end up the learning scheme with a binarization stage, i.e. binary quantization, which inevitably destroys the neighborhood structure of original data. As a result, those methods still suffer from great similarity loss and result in unsatisfactory indexing performance. In this paper we propose a novel hashing model, namely Post Tuned Hashing (PTH), which includes a new post-tuning stage to refine the binary codes after binarization. The post-tuning seeks to rebuild the destroyed neighborhood structure, and hence significantly improves the indexing performance. We cast the post-tuning into a binary quadratic optimization framework and, despite its NP-hardness, give a practical algorithm to efficiently obtain a high-quality solution. Experimental results on five noted image benchmarks show that our PTH improves previous state-of-the-art methods by 13-58% in mean average precision.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"2255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130225687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings of the 26th ACM international conference on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1