{"title":"Session details: Vision-3 (Applications in Multimedia)","authors":"Liqiang Nie","doi":"10.1145/3286935","DOIUrl":"https://doi.org/10.1145/3286935","url":null,"abstract":"","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114990356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haoran Zhang, Zhenzhen Hu, Changzhi Luo, W. Zuo, M. Wang
Recently, image inpainting task has revived with the help of deep learning techniques. Deep neural networks, especially the generative adversarial networks~(GANs) make it possible to recover the missing details in images. Due to the lack of sufficient context information, most existing methods fail to get satisfactory inpainting results. This work investigates a more challenging problem, e.g., the newly-emerging semantic image inpainting - a task to fill in large holes in natural images. In this paper, we propose an end-to-end framework named progressive generative networks~(PGN), which regards the semantic image inpainting task as a curriculum learning problem. Specifically, we divide the hole filling process into several different phases and each phase aims to finish a course of the entire curriculum. After that, an LSTM framework is used to string all the phases together. By introducing this learning strategy, our approach is able to progressively shrink the large corrupted regions in natural images and yields promising inpainting results. Moreover, the proposed approach is quite fast to evaluate as the entire hole filling is performed in a single forward pass. Extensive experiments on Paris Street View and ImageNet dataset clearly demonstrate the superiority of our approach. Code for our models is available at https://github.com/crashmoon/Progressive-Generative-Networks.
{"title":"Semantic Image Inpainting with Progressive Generative Networks","authors":"Haoran Zhang, Zhenzhen Hu, Changzhi Luo, W. Zuo, M. Wang","doi":"10.1145/3240508.3240625","DOIUrl":"https://doi.org/10.1145/3240508.3240625","url":null,"abstract":"Recently, image inpainting task has revived with the help of deep learning techniques. Deep neural networks, especially the generative adversarial networks~(GANs) make it possible to recover the missing details in images. Due to the lack of sufficient context information, most existing methods fail to get satisfactory inpainting results. This work investigates a more challenging problem, e.g., the newly-emerging semantic image inpainting - a task to fill in large holes in natural images. In this paper, we propose an end-to-end framework named progressive generative networks~(PGN), which regards the semantic image inpainting task as a curriculum learning problem. Specifically, we divide the hole filling process into several different phases and each phase aims to finish a course of the entire curriculum. After that, an LSTM framework is used to string all the phases together. By introducing this learning strategy, our approach is able to progressively shrink the large corrupted regions in natural images and yields promising inpainting results. Moreover, the proposed approach is quite fast to evaluate as the entire hole filling is performed in a single forward pass. Extensive experiments on Paris Street View and ImageNet dataset clearly demonstrate the superiority of our approach. Code for our models is available at https://github.com/crashmoon/Progressive-Generative-Networks.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116681027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Sang, Jun Yu, R. Jain, R. Lienhart, Peng Cui, Jiashi Feng
Deep learning has been successfully explored in addressing different multimedia topics recent years, ranging from object detection, semantic classification, entity annotation, to multimedia captioning, multimedia question answering and storytelling. Open source libraries and platforms such as Tensorflow, Caffe, MXnet significantly help promote the wide deployment of deep learning in solving real-world applications. On one hand, deep learning practitioners, while not necessary to understand the involved math behind, are able to set up and make use of a complex deep network. One recent deep learning tool based on Keras even provides the graphical interface to enable straightforward 'drag and drop' operation for deep learning programming. On the other hand, however, some general theoretical problems of learning such as the interpretation and generalization, have only achieved limited progress. Most deep learning papers published these days follow the pipeline of designing/modifying network structures - tuning parameters - reporting performance improvement in specific applications. We have even seen many deep learning application papers without one single equation. Theoretical interpretation and the science behind the study are largely ignored. While excited about the successful application of deep learning in classical and novel problems, we multimedia researchers are responsible to think and solve the fundamental topics in deep learning science. Prof. Guanrong Chen recently wrote an editorial note titled 'Science and Technology, not SciTech' [1]. This panel falls into similar discussion and aims to invite prestigious multimedia researchers and active deep learning practitioners to discuss the positioning of deep learning research now and in the future. Specifically, each panelist is asked to present their opinions on the following five questions: 1)How do you think the current phenomenon that deep learning applications are explosively growing, while the general theoretical problems remain slow progress? 2)Do you agree that deployment of deep learning techniques is getting easy (with a low barrier), while deep learning research is difficult (with a high barrier) 3)What do you think are the core problems for deep learning techniques? 4)What do you think are the core problems for deep learning science? 5)What's your suggestion on the multimedia research in the post-deep learning era?
{"title":"Deep Learning for Multimedia: Science or Technology?","authors":"J. Sang, Jun Yu, R. Jain, R. Lienhart, Peng Cui, Jiashi Feng","doi":"10.1145/3240508.3243931","DOIUrl":"https://doi.org/10.1145/3240508.3243931","url":null,"abstract":"Deep learning has been successfully explored in addressing different multimedia topics recent years, ranging from object detection, semantic classification, entity annotation, to multimedia captioning, multimedia question answering and storytelling. Open source libraries and platforms such as Tensorflow, Caffe, MXnet significantly help promote the wide deployment of deep learning in solving real-world applications. On one hand, deep learning practitioners, while not necessary to understand the involved math behind, are able to set up and make use of a complex deep network. One recent deep learning tool based on Keras even provides the graphical interface to enable straightforward 'drag and drop' operation for deep learning programming. On the other hand, however, some general theoretical problems of learning such as the interpretation and generalization, have only achieved limited progress. Most deep learning papers published these days follow the pipeline of designing/modifying network structures - tuning parameters - reporting performance improvement in specific applications. We have even seen many deep learning application papers without one single equation. Theoretical interpretation and the science behind the study are largely ignored. While excited about the successful application of deep learning in classical and novel problems, we multimedia researchers are responsible to think and solve the fundamental topics in deep learning science. Prof. Guanrong Chen recently wrote an editorial note titled 'Science and Technology, not SciTech' [1]. This panel falls into similar discussion and aims to invite prestigious multimedia researchers and active deep learning practitioners to discuss the positioning of deep learning research now and in the future. Specifically, each panelist is asked to present their opinions on the following five questions: 1)How do you think the current phenomenon that deep learning applications are explosively growing, while the general theoretical problems remain slow progress? 2)Do you agree that deployment of deep learning techniques is getting easy (with a low barrier), while deep learning research is difficult (with a high barrier) 3)What do you think are the core problems for deep learning techniques? 4)What do you think are the core problems for deep learning science? 5)What's your suggestion on the multimedia research in the post-deep learning era?","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121050506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, we propose to study a special semantic segmentation problem where the targets are long and continuous strip patterns. Strip patterns widely exist in medical images and natural photos, such as retinal layers in OCT images and lanes on the roads, and segmentation of them has practical significance. Traditional pixel-level segmentation methods largely ignore the structure prior of strip patterns and thus easily suffer from the topological inconformity problem, such as holes and isolated islands in segmentation results. To tackle this problem, we design a novel deep framework, StripNet, that leverages the strong end-to-end learning ability of CNNs to predict the structured outputs as a sequence of boundary locations of the target strips. Specifically, StripNet decomposes the original segmentation problem into more easily solved local boundary-regression problems, and takes account of the topological constraints on the predicted boundaries. Moreover, our framework adopts a coarse-to-fine strategy and uses carefully designed heatmaps for training the boundary localization network. We examine StripNet on two challenging strip pattern segmentation tasks, retinal layer segmentation and lane detection. Extensive experiments demonstrate that StripNet achieves excellent results and outperforms state-of-the-art methods in both tasks.
{"title":"StripNet","authors":"Guoxiang Qu, Wenwei Zhang, Zhe Wang, Xing Dai, Jianping Shi, Junjun He, Fei Li, Xiulan Zhang, Y. Qiao","doi":"10.1145/3240508.3240553","DOIUrl":"https://doi.org/10.1145/3240508.3240553","url":null,"abstract":"In this work, we propose to study a special semantic segmentation problem where the targets are long and continuous strip patterns. Strip patterns widely exist in medical images and natural photos, such as retinal layers in OCT images and lanes on the roads, and segmentation of them has practical significance. Traditional pixel-level segmentation methods largely ignore the structure prior of strip patterns and thus easily suffer from the topological inconformity problem, such as holes and isolated islands in segmentation results. To tackle this problem, we design a novel deep framework, StripNet, that leverages the strong end-to-end learning ability of CNNs to predict the structured outputs as a sequence of boundary locations of the target strips. Specifically, StripNet decomposes the original segmentation problem into more easily solved local boundary-regression problems, and takes account of the topological constraints on the predicted boundaries. Moreover, our framework adopts a coarse-to-fine strategy and uses carefully designed heatmaps for training the boundary localization network. We examine StripNet on two challenging strip pattern segmentation tasks, retinal layer segmentation and lane detection. Extensive experiments demonstrate that StripNet achieves excellent results and outperforms state-of-the-art methods in both tasks.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127101735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Forgione, A. Carlier, Géraldine Morin, Wei Tsang Ooi, V. Charvillat, P. Yadav
We demonstrate the use of DASH, a widely-deployed standard for streaming video content, for streaming 3D content in an NVE (Networked Virtual Environment) consisting of 3D geometry and associated textures. We have developed a DASH client for NVE to show how NVE benefits from the advantages of DASH: it offers a scalable, easy-to-deploy 3D streaming framework. In our system, the 3D content is first statically partitioned into compliant DASH data, and metadata is provided in order for the client to manage which data to download. Based on a proposed utility metric for geometry and texture at the different resolution, the client can choose the content to request depending on its viewpoint. We effectively provide a Web-based client to navigate through our sample 3D scene, while deriving the streaming requests from its computation of the necessary online parameters, in a receiver-driven manner.
{"title":"An Implementation of a DASH Client for Browsing Networked Virtual Environment","authors":"Thomas Forgione, A. Carlier, Géraldine Morin, Wei Tsang Ooi, V. Charvillat, P. Yadav","doi":"10.1145/3240508.3241398","DOIUrl":"https://doi.org/10.1145/3240508.3241398","url":null,"abstract":"We demonstrate the use of DASH, a widely-deployed standard for streaming video content, for streaming 3D content in an NVE (Networked Virtual Environment) consisting of 3D geometry and associated textures. We have developed a DASH client for NVE to show how NVE benefits from the advantages of DASH: it offers a scalable, easy-to-deploy 3D streaming framework. In our system, the 3D content is first statically partitioned into compliant DASH data, and metadata is provided in order for the client to manage which data to download. Based on a proposed utility metric for geometry and texture at the different resolution, the client can choose the content to request depending on its viewpoint. We effectively provide a Web-based client to navigate through our sample 3D scene, while deriving the streaming requests from its computation of the necessary online parameters, in a receiver-driven manner.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116077964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Subspace clustering aims at clustering data points drawn from a union of low-dimensional subspaces. Recently deep neural networks are introduced into this problem to improve both representation ability and precision for non-linear data. However, such models are sensitive to noise and outliers, since both difficult and easy samples are treated equally. On the contrary, in the human cognitive process, individuals tend to follow a learning paradigm from easy to hard and less to more. In other words, human beings always learn from simple concepts, then absorb more complicated ones gradually. Inspired by such learning scheme, in this paper, we propose a robust deep subspace clustering framework based on the principle of human cognitive process. Specifically, we measure the easinesses of samples dynamically so that our proposed method could gradually utilize instances from easy to more complex ones in a robust way. Meanwhile, a promising solution is designed to update the weights and parameters using an alternative optimization strategy, followed by a theoretical analysis to demonstrated the rationality of the proposed method. Experimental results on three popular benchmark datasets demonstrate the validity of the proposed method.
{"title":"When to Learn What: Deep Cognitive Subspace Clustering","authors":"Yangbangyan Jiang, Zhiyong Yang, Qianqian Xu, Xiaochun Cao, Qingming Huang","doi":"10.1145/3240508.3240582","DOIUrl":"https://doi.org/10.1145/3240508.3240582","url":null,"abstract":"Subspace clustering aims at clustering data points drawn from a union of low-dimensional subspaces. Recently deep neural networks are introduced into this problem to improve both representation ability and precision for non-linear data. However, such models are sensitive to noise and outliers, since both difficult and easy samples are treated equally. On the contrary, in the human cognitive process, individuals tend to follow a learning paradigm from easy to hard and less to more. In other words, human beings always learn from simple concepts, then absorb more complicated ones gradually. Inspired by such learning scheme, in this paper, we propose a robust deep subspace clustering framework based on the principle of human cognitive process. Specifically, we measure the easinesses of samples dynamically so that our proposed method could gradually utilize instances from easy to more complex ones in a robust way. Meanwhile, a promising solution is designed to update the weights and parameters using an alternative optimization strategy, followed by a theoretical analysis to demonstrated the rationality of the proposed method. Experimental results on three popular benchmark datasets demonstrate the validity of the proposed method.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122421659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Attention mechanism has greatly promoted the development of Visual Question Answering (VQA). Attention distribution, which weights differently on objects (such as image regions or bounding boxes) in an image according to their importance for answering a question, plays a crucial role in attention mechanism. Most of the existing work focuses on fusing image features and text features to calculate the attention distribution without comparisons between different image objects. As a major property of attention, selectivity depends on comparisons between different objects. Comparisons provide more information for assigning attentions better. For achieving this, we propose an object-difference attention (ODA) which calculates the probability of attention by implementing difference operator between different image objects in an image under the guidance of questions in hand. Experimental results on three publicly available datasets show our ODA based VQA model achieves the state-of-the-art results. Furthermore, a general form of relational attention is proposed. Besides ODA, several other relational attentions are given. Experimental results show those relational attentions have strengths on different types of questions.
{"title":"Object-Difference Attention: A Simple Relational Attention for Visual Question Answering","authors":"Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong","doi":"10.1145/3240508.3240513","DOIUrl":"https://doi.org/10.1145/3240508.3240513","url":null,"abstract":"Attention mechanism has greatly promoted the development of Visual Question Answering (VQA). Attention distribution, which weights differently on objects (such as image regions or bounding boxes) in an image according to their importance for answering a question, plays a crucial role in attention mechanism. Most of the existing work focuses on fusing image features and text features to calculate the attention distribution without comparisons between different image objects. As a major property of attention, selectivity depends on comparisons between different objects. Comparisons provide more information for assigning attentions better. For achieving this, we propose an object-difference attention (ODA) which calculates the probability of attention by implementing difference operator between different image objects in an image under the guidance of questions in hand. Experimental results on three publicly available datasets show our ODA based VQA model achieves the state-of-the-art results. Furthermore, a general form of relational attention is proposed. Besides ODA, several other relational attentions are given. Experimental results show those relational attentions have strengths on different types of questions.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122971564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Multimedia -3 (Multimedia Search)","authors":"J. Sang","doi":"10.1145/3286940","DOIUrl":"https://doi.org/10.1145/3286940","url":null,"abstract":"","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128443584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haolin Ren, B. Renoust, G. Melançon, M. Viaud, S. Satoh
One task key to the analysis of large multimedia archive over time is to dynamically monitor the activity of concepts and entities with their interactions. This is helpful to analyze threads of topics over news archives (how stories unfold), or to monitor evolutions and development of social groups. Dynamic graph modeling is a powerful tool to capture these interactions over time, while visualization and finding communities still remain difficult, especially with a high density of links. We propose to extract the backbone of dynamic graphs in order to ease community detection and guide the exploration of trends evolution. Through the graph structure, we interactively coordinate node-link diagrams, Sankey diagrams, time series, and animations in order to extract patterns and follow community behavior. We illustrate our system with the exploration of the role of soccer in 6 years of TV/radio magazines in France, and the role of North Korea in about 10 years of Japanese news.
{"title":"Exploring Temporal Communities in Mass Media Archives","authors":"Haolin Ren, B. Renoust, G. Melançon, M. Viaud, S. Satoh","doi":"10.1145/3240508.3241392","DOIUrl":"https://doi.org/10.1145/3240508.3241392","url":null,"abstract":"One task key to the analysis of large multimedia archive over time is to dynamically monitor the activity of concepts and entities with their interactions. This is helpful to analyze threads of topics over news archives (how stories unfold), or to monitor evolutions and development of social groups. Dynamic graph modeling is a powerful tool to capture these interactions over time, while visualization and finding communities still remain difficult, especially with a high density of links. We propose to extract the backbone of dynamic graphs in order to ease community detection and guide the exploration of trends evolution. Through the graph structure, we interactively coordinate node-link diagrams, Sankey diagrams, time series, and animations in order to extract patterns and follow community behavior. We illustrate our system with the exploration of the role of soccer in 6 years of TV/radio magazines in France, and the role of North Korea in about 10 years of Japanese news.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128512924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Personalized facial action unit (AU) recognition is challenging due to subject-dependent facial behavior. This paper proposes a method to recognize personalized multiple facial AUs through a novel generative adversarial network, which adapts the distribution of source domain facial images to that of target domain facial images and detects multiple AUs by leveraging AU dependencies. Specifically, we use a generative adversarial network to generate synthetic images from source domain; the synthetic images have a similar appearance to the target subject and retain the AU patterns of the source images. We simultaneously leverage AU dependencies to train a multiple AU classifier. Experimental results on three benchmark databases demonstrate that the proposed method can successfully realize unsupervised domain adaptation for individual AU detection, and thus outperforms state-of-the-art AU detection methods.
{"title":"Personalized Multiple Facial Action Unit Recognition through Generative Adversarial Recognition Network","authors":"Can Wang, Shangfei Wang","doi":"10.1145/3240508.3240613","DOIUrl":"https://doi.org/10.1145/3240508.3240613","url":null,"abstract":"Personalized facial action unit (AU) recognition is challenging due to subject-dependent facial behavior. This paper proposes a method to recognize personalized multiple facial AUs through a novel generative adversarial network, which adapts the distribution of source domain facial images to that of target domain facial images and detects multiple AUs by leveraging AU dependencies. Specifically, we use a generative adversarial network to generate synthetic images from source domain; the synthetic images have a similar appearance to the target subject and retain the AU patterns of the source images. We simultaneously leverage AU dependencies to train a multiple AU classifier. Experimental results on three benchmark databases demonstrate that the proposed method can successfully realize unsupervised domain adaptation for individual AU detection, and thus outperforms state-of-the-art AU detection methods.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128661992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}