首页 > 最新文献

Proceedings of the 26th ACM international conference on Multimedia最新文献

英文 中文
Session details: Vision-3 (Applications in Multimedia) 会议详情:愿景-3(多媒体应用)
Pub Date : 2018-10-15 DOI: 10.1145/3286935
Liqiang Nie
{"title":"Session details: Vision-3 (Applications in Multimedia)","authors":"Liqiang Nie","doi":"10.1145/3286935","DOIUrl":"https://doi.org/10.1145/3286935","url":null,"abstract":"","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114990356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic Image Inpainting with Progressive Generative Networks 渐进式生成网络的语义图像绘制
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240625
Haoran Zhang, Zhenzhen Hu, Changzhi Luo, W. Zuo, M. Wang
Recently, image inpainting task has revived with the help of deep learning techniques. Deep neural networks, especially the generative adversarial networks~(GANs) make it possible to recover the missing details in images. Due to the lack of sufficient context information, most existing methods fail to get satisfactory inpainting results. This work investigates a more challenging problem, e.g., the newly-emerging semantic image inpainting - a task to fill in large holes in natural images. In this paper, we propose an end-to-end framework named progressive generative networks~(PGN), which regards the semantic image inpainting task as a curriculum learning problem. Specifically, we divide the hole filling process into several different phases and each phase aims to finish a course of the entire curriculum. After that, an LSTM framework is used to string all the phases together. By introducing this learning strategy, our approach is able to progressively shrink the large corrupted regions in natural images and yields promising inpainting results. Moreover, the proposed approach is quite fast to evaluate as the entire hole filling is performed in a single forward pass. Extensive experiments on Paris Street View and ImageNet dataset clearly demonstrate the superiority of our approach. Code for our models is available at https://github.com/crashmoon/Progressive-Generative-Networks.
近年来,在深度学习技术的帮助下,图像绘制任务重新兴起。深度神经网络,特别是生成式对抗网络(GANs),使恢复图像中缺失的细节成为可能。由于缺乏足够的上下文信息,现有的方法大多不能得到令人满意的喷漆效果。这项工作研究了一个更具挑战性的问题,例如,新出现的语义图像绘制-一项在自然图像中填充大洞的任务。在本文中,我们提出了一个端到端的框架渐进式生成网络~(PGN),该框架将语义图像绘制任务视为一个课程学习问题。具体来说,我们将填洞过程分为几个不同的阶段,每个阶段的目标是完成整个课程中的一个课程。之后,使用LSTM框架将所有阶段串联在一起。通过引入这种学习策略,我们的方法能够逐步缩小自然图像中的大型损坏区域,并产生有希望的修复结果。此外,所提出的方法是相当快的评估,因为整个孔填充是在一个单一的向前通道进行。在巴黎街景和ImageNet数据集上进行的大量实验清楚地证明了我们方法的优越性。我们的模型代码可在https://github.com/crashmoon/Progressive-Generative-Networks上获得。
{"title":"Semantic Image Inpainting with Progressive Generative Networks","authors":"Haoran Zhang, Zhenzhen Hu, Changzhi Luo, W. Zuo, M. Wang","doi":"10.1145/3240508.3240625","DOIUrl":"https://doi.org/10.1145/3240508.3240625","url":null,"abstract":"Recently, image inpainting task has revived with the help of deep learning techniques. Deep neural networks, especially the generative adversarial networks~(GANs) make it possible to recover the missing details in images. Due to the lack of sufficient context information, most existing methods fail to get satisfactory inpainting results. This work investigates a more challenging problem, e.g., the newly-emerging semantic image inpainting - a task to fill in large holes in natural images. In this paper, we propose an end-to-end framework named progressive generative networks~(PGN), which regards the semantic image inpainting task as a curriculum learning problem. Specifically, we divide the hole filling process into several different phases and each phase aims to finish a course of the entire curriculum. After that, an LSTM framework is used to string all the phases together. By introducing this learning strategy, our approach is able to progressively shrink the large corrupted regions in natural images and yields promising inpainting results. Moreover, the proposed approach is quite fast to evaluate as the entire hole filling is performed in a single forward pass. Extensive experiments on Paris Street View and ImageNet dataset clearly demonstrate the superiority of our approach. Code for our models is available at https://github.com/crashmoon/Progressive-Generative-Networks.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116681027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 107
Deep Learning for Multimedia: Science or Technology? 多媒体的深度学习:科学还是技术?
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3243931
J. Sang, Jun Yu, R. Jain, R. Lienhart, Peng Cui, Jiashi Feng
Deep learning has been successfully explored in addressing different multimedia topics recent years, ranging from object detection, semantic classification, entity annotation, to multimedia captioning, multimedia question answering and storytelling. Open source libraries and platforms such as Tensorflow, Caffe, MXnet significantly help promote the wide deployment of deep learning in solving real-world applications. On one hand, deep learning practitioners, while not necessary to understand the involved math behind, are able to set up and make use of a complex deep network. One recent deep learning tool based on Keras even provides the graphical interface to enable straightforward 'drag and drop' operation for deep learning programming. On the other hand, however, some general theoretical problems of learning such as the interpretation and generalization, have only achieved limited progress. Most deep learning papers published these days follow the pipeline of designing/modifying network structures - tuning parameters - reporting performance improvement in specific applications. We have even seen many deep learning application papers without one single equation. Theoretical interpretation and the science behind the study are largely ignored. While excited about the successful application of deep learning in classical and novel problems, we multimedia researchers are responsible to think and solve the fundamental topics in deep learning science. Prof. Guanrong Chen recently wrote an editorial note titled 'Science and Technology, not SciTech' [1]. This panel falls into similar discussion and aims to invite prestigious multimedia researchers and active deep learning practitioners to discuss the positioning of deep learning research now and in the future. Specifically, each panelist is asked to present their opinions on the following five questions: 1)How do you think the current phenomenon that deep learning applications are explosively growing, while the general theoretical problems remain slow progress? 2)Do you agree that deployment of deep learning techniques is getting easy (with a low barrier), while deep learning research is difficult (with a high barrier) 3)What do you think are the core problems for deep learning techniques? 4)What do you think are the core problems for deep learning science? 5)What's your suggestion on the multimedia research in the post-deep learning era?
近年来,深度学习已经成功地用于解决不同的多媒体主题,从目标检测、语义分类、实体注释到多媒体字幕、多媒体问答和讲故事。开源库和平台,如Tensorflow、Caffe、MXnet,极大地促进了深度学习在解决实际应用中的广泛部署。一方面,深度学习从业者虽然不需要理解背后涉及的数学,但能够建立和使用复杂的深度网络。最近一个基于Keras的深度学习工具甚至提供了图形界面,可以为深度学习编程提供直接的“拖放”操作。然而,另一方面,学习的一些一般性理论问题,如解释和概括,只取得了有限的进展。最近发表的大多数深度学习论文都遵循了设计/修改网络结构——调整参数——在特定应用中报告性能改进的流程。我们甚至看到许多深度学习应用论文没有一个方程。这项研究背后的理论解释和科学在很大程度上被忽视了。在对深度学习在经典和新问题中的成功应用感到兴奋的同时,我们多媒体研究人员有责任思考和解决深度学习科学中的基本问题。陈冠荣教授最近写了一篇题为《科学技术,而不是科学技术》的社论[1]。本次专题讨论也是类似的讨论,旨在邀请著名的多媒体研究人员和活跃的深度学习实践者来讨论深度学习研究现在和未来的定位。具体来说,每个小组成员都被要求就以下五个问题发表自己的看法:1)您如何看待当前深度学习应用爆炸式增长,而一般理论问题仍然进展缓慢的现象?2)你是否同意深度学习技术的部署变得越来越容易(低门槛),而深度学习研究变得越来越困难(高门槛)3)你认为深度学习技术的核心问题是什么?4)你认为深度学习科学的核心问题是什么?
{"title":"Deep Learning for Multimedia: Science or Technology?","authors":"J. Sang, Jun Yu, R. Jain, R. Lienhart, Peng Cui, Jiashi Feng","doi":"10.1145/3240508.3243931","DOIUrl":"https://doi.org/10.1145/3240508.3243931","url":null,"abstract":"Deep learning has been successfully explored in addressing different multimedia topics recent years, ranging from object detection, semantic classification, entity annotation, to multimedia captioning, multimedia question answering and storytelling. Open source libraries and platforms such as Tensorflow, Caffe, MXnet significantly help promote the wide deployment of deep learning in solving real-world applications. On one hand, deep learning practitioners, while not necessary to understand the involved math behind, are able to set up and make use of a complex deep network. One recent deep learning tool based on Keras even provides the graphical interface to enable straightforward 'drag and drop' operation for deep learning programming. On the other hand, however, some general theoretical problems of learning such as the interpretation and generalization, have only achieved limited progress. Most deep learning papers published these days follow the pipeline of designing/modifying network structures - tuning parameters - reporting performance improvement in specific applications. We have even seen many deep learning application papers without one single equation. Theoretical interpretation and the science behind the study are largely ignored. While excited about the successful application of deep learning in classical and novel problems, we multimedia researchers are responsible to think and solve the fundamental topics in deep learning science. Prof. Guanrong Chen recently wrote an editorial note titled 'Science and Technology, not SciTech' [1]. This panel falls into similar discussion and aims to invite prestigious multimedia researchers and active deep learning practitioners to discuss the positioning of deep learning research now and in the future. Specifically, each panelist is asked to present their opinions on the following five questions: 1)How do you think the current phenomenon that deep learning applications are explosively growing, while the general theoretical problems remain slow progress? 2)Do you agree that deployment of deep learning techniques is getting easy (with a low barrier), while deep learning research is difficult (with a high barrier) 3)What do you think are the core problems for deep learning techniques? 4)What do you think are the core problems for deep learning science? 5)What's your suggestion on the multimedia research in the post-deep learning era?","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121050506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
StripNet
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240553
Guoxiang Qu, Wenwei Zhang, Zhe Wang, Xing Dai, Jianping Shi, Junjun He, Fei Li, Xiulan Zhang, Y. Qiao
In this work, we propose to study a special semantic segmentation problem where the targets are long and continuous strip patterns. Strip patterns widely exist in medical images and natural photos, such as retinal layers in OCT images and lanes on the roads, and segmentation of them has practical significance. Traditional pixel-level segmentation methods largely ignore the structure prior of strip patterns and thus easily suffer from the topological inconformity problem, such as holes and isolated islands in segmentation results. To tackle this problem, we design a novel deep framework, StripNet, that leverages the strong end-to-end learning ability of CNNs to predict the structured outputs as a sequence of boundary locations of the target strips. Specifically, StripNet decomposes the original segmentation problem into more easily solved local boundary-regression problems, and takes account of the topological constraints on the predicted boundaries. Moreover, our framework adopts a coarse-to-fine strategy and uses carefully designed heatmaps for training the boundary localization network. We examine StripNet on two challenging strip pattern segmentation tasks, retinal layer segmentation and lane detection. Extensive experiments demonstrate that StripNet achieves excellent results and outperforms state-of-the-art methods in both tasks.
{"title":"StripNet","authors":"Guoxiang Qu, Wenwei Zhang, Zhe Wang, Xing Dai, Jianping Shi, Junjun He, Fei Li, Xiulan Zhang, Y. Qiao","doi":"10.1145/3240508.3240553","DOIUrl":"https://doi.org/10.1145/3240508.3240553","url":null,"abstract":"In this work, we propose to study a special semantic segmentation problem where the targets are long and continuous strip patterns. Strip patterns widely exist in medical images and natural photos, such as retinal layers in OCT images and lanes on the roads, and segmentation of them has practical significance. Traditional pixel-level segmentation methods largely ignore the structure prior of strip patterns and thus easily suffer from the topological inconformity problem, such as holes and isolated islands in segmentation results. To tackle this problem, we design a novel deep framework, StripNet, that leverages the strong end-to-end learning ability of CNNs to predict the structured outputs as a sequence of boundary locations of the target strips. Specifically, StripNet decomposes the original segmentation problem into more easily solved local boundary-regression problems, and takes account of the topological constraints on the predicted boundaries. Moreover, our framework adopts a coarse-to-fine strategy and uses carefully designed heatmaps for training the boundary localization network. We examine StripNet on two challenging strip pattern segmentation tasks, retinal layer segmentation and lane detection. Extensive experiments demonstrate that StripNet achieves excellent results and outperforms state-of-the-art methods in both tasks.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127101735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
An Implementation of a DASH Client for Browsing Networked Virtual Environment 浏览网络虚拟环境的DASH客户端实现
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3241398
Thomas Forgione, A. Carlier, Géraldine Morin, Wei Tsang Ooi, V. Charvillat, P. Yadav
We demonstrate the use of DASH, a widely-deployed standard for streaming video content, for streaming 3D content in an NVE (Networked Virtual Environment) consisting of 3D geometry and associated textures. We have developed a DASH client for NVE to show how NVE benefits from the advantages of DASH: it offers a scalable, easy-to-deploy 3D streaming framework. In our system, the 3D content is first statically partitioned into compliant DASH data, and metadata is provided in order for the client to manage which data to download. Based on a proposed utility metric for geometry and texture at the different resolution, the client can choose the content to request depending on its viewpoint. We effectively provide a Web-based client to navigate through our sample 3D scene, while deriving the streaming requests from its computation of the necessary online parameters, in a receiver-driven manner.
我们演示了DASH(一种广泛部署的流媒体视频内容标准)在由3D几何形状和相关纹理组成的NVE(网络虚拟环境)中流媒体3D内容的使用。我们已经为NVE开发了一个DASH客户端,以展示NVE如何从DASH的优势中获益:它提供了一个可扩展的,易于部署的3D流媒体框架。在我们的系统中,首先将3D内容静态划分为符合标准的DASH数据,并提供元数据,以便客户管理下载哪些数据。基于不同分辨率下的几何和纹理的效用度量,客户端可以根据其视点选择要请求的内容。我们有效地提供了一个基于web的客户端来浏览我们的示例3D场景,同时以接收器驱动的方式从必要的在线参数的计算中派生流请求。
{"title":"An Implementation of a DASH Client for Browsing Networked Virtual Environment","authors":"Thomas Forgione, A. Carlier, Géraldine Morin, Wei Tsang Ooi, V. Charvillat, P. Yadav","doi":"10.1145/3240508.3241398","DOIUrl":"https://doi.org/10.1145/3240508.3241398","url":null,"abstract":"We demonstrate the use of DASH, a widely-deployed standard for streaming video content, for streaming 3D content in an NVE (Networked Virtual Environment) consisting of 3D geometry and associated textures. We have developed a DASH client for NVE to show how NVE benefits from the advantages of DASH: it offers a scalable, easy-to-deploy 3D streaming framework. In our system, the 3D content is first statically partitioned into compliant DASH data, and metadata is provided in order for the client to manage which data to download. Based on a proposed utility metric for geometry and texture at the different resolution, the client can choose the content to request depending on its viewpoint. We effectively provide a Web-based client to navigate through our sample 3D scene, while deriving the streaming requests from its computation of the necessary online parameters, in a receiver-driven manner.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116077964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
When to Learn What: Deep Cognitive Subspace Clustering 何时学习什么:深度认知子空间聚类
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240582
Yangbangyan Jiang, Zhiyong Yang, Qianqian Xu, Xiaochun Cao, Qingming Huang
Subspace clustering aims at clustering data points drawn from a union of low-dimensional subspaces. Recently deep neural networks are introduced into this problem to improve both representation ability and precision for non-linear data. However, such models are sensitive to noise and outliers, since both difficult and easy samples are treated equally. On the contrary, in the human cognitive process, individuals tend to follow a learning paradigm from easy to hard and less to more. In other words, human beings always learn from simple concepts, then absorb more complicated ones gradually. Inspired by such learning scheme, in this paper, we propose a robust deep subspace clustering framework based on the principle of human cognitive process. Specifically, we measure the easinesses of samples dynamically so that our proposed method could gradually utilize instances from easy to more complex ones in a robust way. Meanwhile, a promising solution is designed to update the weights and parameters using an alternative optimization strategy, followed by a theoretical analysis to demonstrated the rationality of the proposed method. Experimental results on three popular benchmark datasets demonstrate the validity of the proposed method.
子空间聚类的目的是聚类从低维子空间的并集中提取的数据点。近年来,为了提高非线性数据的表示能力和精度,深度神经网络被引入到该问题中。然而,这种模型对噪声和异常值很敏感,因为困难和容易的样本都是平等对待的。相反,在人类的认知过程中,个体往往遵循由易到难、由少到多的学习范式。换句话说,人类总是从简单的概念中学习,然后逐渐吸收更复杂的概念。受这种学习方案的启发,本文提出了一种基于人类认知过程原理的鲁棒深度子空间聚类框架。具体来说,我们动态地度量样本的容易程度,使我们提出的方法能够以鲁棒的方式逐步利用从简单到复杂的实例。同时,设计了一种利用备选优化策略更新权值和参数的解决方案,并通过理论分析证明了所提方法的合理性。在三个常用基准数据集上的实验结果验证了该方法的有效性。
{"title":"When to Learn What: Deep Cognitive Subspace Clustering","authors":"Yangbangyan Jiang, Zhiyong Yang, Qianqian Xu, Xiaochun Cao, Qingming Huang","doi":"10.1145/3240508.3240582","DOIUrl":"https://doi.org/10.1145/3240508.3240582","url":null,"abstract":"Subspace clustering aims at clustering data points drawn from a union of low-dimensional subspaces. Recently deep neural networks are introduced into this problem to improve both representation ability and precision for non-linear data. However, such models are sensitive to noise and outliers, since both difficult and easy samples are treated equally. On the contrary, in the human cognitive process, individuals tend to follow a learning paradigm from easy to hard and less to more. In other words, human beings always learn from simple concepts, then absorb more complicated ones gradually. Inspired by such learning scheme, in this paper, we propose a robust deep subspace clustering framework based on the principle of human cognitive process. Specifically, we measure the easinesses of samples dynamically so that our proposed method could gradually utilize instances from easy to more complex ones in a robust way. Meanwhile, a promising solution is designed to update the weights and parameters using an alternative optimization strategy, followed by a theoretical analysis to demonstrated the rationality of the proposed method. Experimental results on three popular benchmark datasets demonstrate the validity of the proposed method.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122421659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Object-Difference Attention: A Simple Relational Attention for Visual Question Answering 客体差异注意:一种简单的视觉问答关系注意
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240513
Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong
Attention mechanism has greatly promoted the development of Visual Question Answering (VQA). Attention distribution, which weights differently on objects (such as image regions or bounding boxes) in an image according to their importance for answering a question, plays a crucial role in attention mechanism. Most of the existing work focuses on fusing image features and text features to calculate the attention distribution without comparisons between different image objects. As a major property of attention, selectivity depends on comparisons between different objects. Comparisons provide more information for assigning attentions better. For achieving this, we propose an object-difference attention (ODA) which calculates the probability of attention by implementing difference operator between different image objects in an image under the guidance of questions in hand. Experimental results on three publicly available datasets show our ODA based VQA model achieves the state-of-the-art results. Furthermore, a general form of relational attention is proposed. Besides ODA, several other relational attentions are given. Experimental results show those relational attentions have strengths on different types of questions.
注意机制极大地促进了视觉问答(VQA)的发展。注意分配在注意机制中起着至关重要的作用,它根据图像中物体(如图像区域或边界框)对回答问题的重要性给予不同的权重。现有的工作大多集中在融合图像特征和文本特征来计算注意力分布,而没有进行不同图像对象之间的比较。作为注意力的一个主要特性,选择性取决于不同对象之间的比较。比较为更好地分配注意力提供了更多的信息。为了实现这一目标,我们提出了一种目标差分注意(ODA)方法,该方法在手头问题的指导下,通过对图像中不同图像对象之间的差分算子来计算关注的概率。在三个公开数据集上的实验结果表明,我们的基于ODA的VQA模型达到了最先进的结果。在此基础上,提出了关系注意的一般形式。除了ODA之外,还提出了其他几个相关的注意事项。实验结果表明,这些关联关注在不同类型的问题上都有优势。
{"title":"Object-Difference Attention: A Simple Relational Attention for Visual Question Answering","authors":"Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong","doi":"10.1145/3240508.3240513","DOIUrl":"https://doi.org/10.1145/3240508.3240513","url":null,"abstract":"Attention mechanism has greatly promoted the development of Visual Question Answering (VQA). Attention distribution, which weights differently on objects (such as image regions or bounding boxes) in an image according to their importance for answering a question, plays a crucial role in attention mechanism. Most of the existing work focuses on fusing image features and text features to calculate the attention distribution without comparisons between different image objects. As a major property of attention, selectivity depends on comparisons between different objects. Comparisons provide more information for assigning attentions better. For achieving this, we propose an object-difference attention (ODA) which calculates the probability of attention by implementing difference operator between different image objects in an image under the guidance of questions in hand. Experimental results on three publicly available datasets show our ODA based VQA model achieves the state-of-the-art results. Furthermore, a general form of relational attention is proposed. Besides ODA, several other relational attentions are given. Experimental results show those relational attentions have strengths on different types of questions.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122971564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Session details: Multimedia -3 (Multimedia Search) 会议详情:多媒体-3(多媒体搜索)
Pub Date : 2018-10-15 DOI: 10.1145/3286940
J. Sang
{"title":"Session details: Multimedia -3 (Multimedia Search)","authors":"J. Sang","doi":"10.1145/3286940","DOIUrl":"https://doi.org/10.1145/3286940","url":null,"abstract":"","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128443584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Temporal Communities in Mass Media Archives 探索大众传媒档案中的时间社群
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3241392
Haolin Ren, B. Renoust, G. Melançon, M. Viaud, S. Satoh
One task key to the analysis of large multimedia archive over time is to dynamically monitor the activity of concepts and entities with their interactions. This is helpful to analyze threads of topics over news archives (how stories unfold), or to monitor evolutions and development of social groups. Dynamic graph modeling is a powerful tool to capture these interactions over time, while visualization and finding communities still remain difficult, especially with a high density of links. We propose to extract the backbone of dynamic graphs in order to ease community detection and guide the exploration of trends evolution. Through the graph structure, we interactively coordinate node-link diagrams, Sankey diagrams, time series, and animations in order to extract patterns and follow community behavior. We illustrate our system with the exploration of the role of soccer in 6 years of TV/radio magazines in France, and the role of North Korea in about 10 years of Japanese news.
随着时间的推移分析大型多媒体档案的一个关键任务是动态地监视概念和实体的活动及其相互作用。这有助于分析新闻档案中的主题线索(故事如何展开),或监测社会群体的演变和发展。动态图建模是一种强大的工具,可以捕获这些随时间变化的交互,而可视化和查找社区仍然很困难,特别是在链接密度很高的情况下。我们提出提取动态图的主干,以方便社区检测和指导趋势演变的探索。通过图形结构,我们交互地协调节点链接图、Sankey图、时间序列和动画,以提取模式和跟踪社区行为。我们用足球在法国6年的电视/广播杂志中所扮演的角色,以及朝鲜在日本10年的新闻中所扮演的角色来说明我们的系统。
{"title":"Exploring Temporal Communities in Mass Media Archives","authors":"Haolin Ren, B. Renoust, G. Melançon, M. Viaud, S. Satoh","doi":"10.1145/3240508.3241392","DOIUrl":"https://doi.org/10.1145/3240508.3241392","url":null,"abstract":"One task key to the analysis of large multimedia archive over time is to dynamically monitor the activity of concepts and entities with their interactions. This is helpful to analyze threads of topics over news archives (how stories unfold), or to monitor evolutions and development of social groups. Dynamic graph modeling is a powerful tool to capture these interactions over time, while visualization and finding communities still remain difficult, especially with a high density of links. We propose to extract the backbone of dynamic graphs in order to ease community detection and guide the exploration of trends evolution. Through the graph structure, we interactively coordinate node-link diagrams, Sankey diagrams, time series, and animations in order to extract patterns and follow community behavior. We illustrate our system with the exploration of the role of soccer in 6 years of TV/radio magazines in France, and the role of North Korea in about 10 years of Japanese news.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128512924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Personalized Multiple Facial Action Unit Recognition through Generative Adversarial Recognition Network 基于生成对抗识别网络的个性化多面部动作单元识别
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240613
Can Wang, Shangfei Wang
Personalized facial action unit (AU) recognition is challenging due to subject-dependent facial behavior. This paper proposes a method to recognize personalized multiple facial AUs through a novel generative adversarial network, which adapts the distribution of source domain facial images to that of target domain facial images and detects multiple AUs by leveraging AU dependencies. Specifically, we use a generative adversarial network to generate synthetic images from source domain; the synthetic images have a similar appearance to the target subject and retain the AU patterns of the source images. We simultaneously leverage AU dependencies to train a multiple AU classifier. Experimental results on three benchmark databases demonstrate that the proposed method can successfully realize unsupervised domain adaptation for individual AU detection, and thus outperforms state-of-the-art AU detection methods.
由于个体面部行为的依赖性,个性化面部动作单元(AU)识别具有挑战性。本文提出了一种基于生成对抗网络的个性化多人脸识别方法,该方法将源域人脸图像的分布与目标域人脸图像的分布相适应,利用人脸图像的依赖关系检测多个人脸图像。具体来说,我们使用生成式对抗网络从源域生成合成图像;合成图像具有与目标主体相似的外观并保留源图像的AU模式。我们同时利用AU依赖来训练多个AU分类器。在三个基准数据库上的实验结果表明,该方法可以成功地实现对单个非监督域的自适应,从而优于现有的非监督域检测方法。
{"title":"Personalized Multiple Facial Action Unit Recognition through Generative Adversarial Recognition Network","authors":"Can Wang, Shangfei Wang","doi":"10.1145/3240508.3240613","DOIUrl":"https://doi.org/10.1145/3240508.3240613","url":null,"abstract":"Personalized facial action unit (AU) recognition is challenging due to subject-dependent facial behavior. This paper proposes a method to recognize personalized multiple facial AUs through a novel generative adversarial network, which adapts the distribution of source domain facial images to that of target domain facial images and detects multiple AUs by leveraging AU dependencies. Specifically, we use a generative adversarial network to generate synthetic images from source domain; the synthetic images have a similar appearance to the target subject and retain the AU patterns of the source images. We simultaneously leverage AU dependencies to train a multiple AU classifier. Experimental results on three benchmark databases demonstrate that the proposed method can successfully realize unsupervised domain adaptation for individual AU detection, and thus outperforms state-of-the-art AU detection methods.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128661992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
期刊
Proceedings of the 26th ACM international conference on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1