Due to the emergence of large-scale and high-dimensional data, measuring the similarity between data points becomes challenging. In order to obtain effective representations, metric learning has become one of the most active researches in the field of computer vision and pattern recognition. However, models using trained networks for predictions are often cumbersome and difficult to be deployed. Therefore, in this paper, we propose a novel deep distillation metric learning (DDML) for online teaching in the procedure of learning the distance metric. Specifically, we employ model distillation to transfer the knowledge acquired by the larger model to the smaller model. Unlike the 2-step offline and mutual online manners, we propose to train a powerful teacher model, who transfer the knowledge to a lightweight and generalizable student model and iteratively improved by the feedback from the student model. We show that our method has achieved state-of-the-art results on CUB200-2011 and CARS196 while having advantages in computational efficiency.
{"title":"Deep Distillation Metric Learning","authors":"Jiaxu Han, Tianyu Zhao, Changqing Zhang","doi":"10.1145/3338533.3366560","DOIUrl":"https://doi.org/10.1145/3338533.3366560","url":null,"abstract":"Due to the emergence of large-scale and high-dimensional data, measuring the similarity between data points becomes challenging. In order to obtain effective representations, metric learning has become one of the most active researches in the field of computer vision and pattern recognition. However, models using trained networks for predictions are often cumbersome and difficult to be deployed. Therefore, in this paper, we propose a novel deep distillation metric learning (DDML) for online teaching in the procedure of learning the distance metric. Specifically, we employ model distillation to transfer the knowledge acquired by the larger model to the smaller model. Unlike the 2-step offline and mutual online manners, we propose to train a powerful teacher model, who transfer the knowledge to a lightweight and generalizable student model and iteratively improved by the feedback from the student model. We show that our method has achieved state-of-the-art results on CUB200-2011 and CARS196 while having advantages in computational efficiency.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116846886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Microblogging data contains a wealth of information of trending events and has gained increased attention among users, organizations, and research scholars for social media mining in different disciplines. Event storyline generation is one typical task of social media mining, whose goal is to extract the development stages with associated description of events. Existing storyline generation methods either generate storyline with less integrity or fail to guarantee the coherence between the discovered stages. Secondly, there are no scientific method to evaluate the quality of the storyline. In this paper, we propose a comprehensive storyline generation framework to address the above disadvantages. Given Microblogging data related to the specified event, we first propose Hot-Word-Based stage detection algorithm to identify the potential stages of event, which can effectively avoid ignoring important stages and preventing inconsistent sequence between stages. Community detection algorithm is applied then to select representative data for each stage. Finally, we conduct graph optimization algorithm to generate the logically coherent storylines of the event. We also introduce a new evaluation metric, SLEU, to emphasize the importance of the integrity and coherence of the generated storyline. Extensive experiments on real-world Chinese microblogging data demonstrate the effectiveness of the proposed methods in each module and the overall framework.
{"title":"Comprehensive Event Storyline Generation from Microblogs","authors":"Wenjin Sun, Yuhang Wang, Yuqi Gao, Zesong Li, J. Sang, Jian Yu","doi":"10.1145/3338533.3366601","DOIUrl":"https://doi.org/10.1145/3338533.3366601","url":null,"abstract":"Microblogging data contains a wealth of information of trending events and has gained increased attention among users, organizations, and research scholars for social media mining in different disciplines. Event storyline generation is one typical task of social media mining, whose goal is to extract the development stages with associated description of events. Existing storyline generation methods either generate storyline with less integrity or fail to guarantee the coherence between the discovered stages. Secondly, there are no scientific method to evaluate the quality of the storyline. In this paper, we propose a comprehensive storyline generation framework to address the above disadvantages. Given Microblogging data related to the specified event, we first propose Hot-Word-Based stage detection algorithm to identify the potential stages of event, which can effectively avoid ignoring important stages and preventing inconsistent sequence between stages. Community detection algorithm is applied then to select representative data for each stage. Finally, we conduct graph optimization algorithm to generate the logically coherent storylines of the event. We also introduce a new evaluation metric, SLEU, to emphasize the importance of the integrity and coherence of the generated storyline. Extensive experiments on real-world Chinese microblogging data demonstrate the effectiveness of the proposed methods in each module and the overall framework.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116159636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image dehazing and deraining are import low-level compute vision tasks. In this paper, we propose a novel method named Selective Attention Network (SAN) to solve these two problems. Due to the density of haze and directions of rain streaks are complex and non-uniform, SAN adopts the channel-wise attention and spatial-channel attention to remove rain streaks and haze both in globally and locally. To better capture various of rain and hazy details, we propose a Selective Attention Module(SAM) to re-scale the channel-wise attention and spatial-channel attention instead of simple element-wise summation. In addition, we conduct ablation studies to validate the effectiveness of the each module of SAN. Extensive experimental results on synthetic and real-world datasets show that SAN performs favorably against state-of-the-art methods.
{"title":"Selective Attention Network for Image Dehazing and Deraining","authors":"Xiao Liang, Runde Li, Jinhui Tang","doi":"10.1145/3338533.3366688","DOIUrl":"https://doi.org/10.1145/3338533.3366688","url":null,"abstract":"Image dehazing and deraining are import low-level compute vision tasks. In this paper, we propose a novel method named Selective Attention Network (SAN) to solve these two problems. Due to the density of haze and directions of rain streaks are complex and non-uniform, SAN adopts the channel-wise attention and spatial-channel attention to remove rain streaks and haze both in globally and locally. To better capture various of rain and hazy details, we propose a Selective Attention Module(SAM) to re-scale the channel-wise attention and spatial-channel attention instead of simple element-wise summation. In addition, we conduct ablation studies to validate the effectiveness of the each module of SAN. Extensive experimental results on synthetic and real-world datasets show that SAN performs favorably against state-of-the-art methods.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126790113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Vision in Multimedia","authors":"H. Hang","doi":"10.1145/3379197","DOIUrl":"https://doi.org/10.1145/3379197","url":null,"abstract":"","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":" August","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113946847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The cloud micro-service architecture provides loosely coupling services and efficient virtual resources, which becomes a promising solution for large-scale video services. It is difficult to efficiently select the optimal services under micro-service architecture, because the large number of micro-services leads to an exponential increase in the number of service selection candidate solutions. In addition, the time sensitivity of video services increases the complexity of service selection, and the video data can affects the service selection results. However, the current video service selection strategies are insufficient under micro-service architecture, because they do not take into account the resource fluctuation of the service instances and the features of the video service comprehensively. In this paper, we focus on the video service selection strategy under micro-service architecture. Firstly, we propose a QoS Prediction (QP) method using explicit factor analysis and linear regression. The QP can accurately predict the QoS values based on the features of video data and service instances. Secondly, we propose a Performance-Aware Video Service Selection (PVSS) method. We prune the candidate services to reduce computational complexity and then efficiently select the optimal solution based on Fruit Fly Optimization (FFO) algorithm. Finally, we conduct extensive experiments to evaluate our strategy, and the results demonstrate the effectiveness of our strategy.
{"title":"A Performance-Aware Selection Strategy for Cloud-based Video Services with Micro-Service Architecture","authors":"Zhengjun Xu, Haitao Zhang, Han Huang","doi":"10.1145/3338533.3366609","DOIUrl":"https://doi.org/10.1145/3338533.3366609","url":null,"abstract":"The cloud micro-service architecture provides loosely coupling services and efficient virtual resources, which becomes a promising solution for large-scale video services. It is difficult to efficiently select the optimal services under micro-service architecture, because the large number of micro-services leads to an exponential increase in the number of service selection candidate solutions. In addition, the time sensitivity of video services increases the complexity of service selection, and the video data can affects the service selection results. However, the current video service selection strategies are insufficient under micro-service architecture, because they do not take into account the resource fluctuation of the service instances and the features of the video service comprehensively. In this paper, we focus on the video service selection strategy under micro-service architecture. Firstly, we propose a QoS Prediction (QP) method using explicit factor analysis and linear regression. The QP can accurately predict the QoS values based on the features of video data and service instances. Secondly, we propose a Performance-Aware Video Service Selection (PVSS) method. We prune the candidate services to reduce computational complexity and then efficiently select the optimal solution based on Fruit Fly Optimization (FFO) algorithm. Finally, we conduct extensive experiments to evaluate our strategy, and the results demonstrate the effectiveness of our strategy.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126988022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To get better saliency maps for salient object detection, recent methods fuse features from different levels of convolutional neural networks and have achieved remarkable progress. However, the differences between different feature levels bring difficulties to the fusion process, thus it may lead to unsatisfactory saliency predictions. To address this issue, we propose Active Perception Network (APN) to enhance inter-feature consistency for salient object detection. First, Mutual Projection Module (MPM) is developed to fuse different features, which uses high-level features as guided information to extract complementary components from low-level features, and can suppress background noises and improve semantic consistency. Self Projection Module (SPM) is designed to further refine the fused features, which can be considered as the extended version of residual connection. Features that pass through SPM can produce more accurate saliency maps. Finally, we propose Head Projection Module (HPM) to aggregate global information, which brings strong semantic consistency to the whole network. Comprehensive experiments on five benchmark datasets demonstrate that the proposed method outperforms the state-of-the-art approaches on different evaluation metrics.
{"title":"Active Perception Network for Salient Object Detection","authors":"Junhang Wei, Shuhui Wang, Liang Li, Qingming Huang","doi":"10.1145/3338533.3366580","DOIUrl":"https://doi.org/10.1145/3338533.3366580","url":null,"abstract":"To get better saliency maps for salient object detection, recent methods fuse features from different levels of convolutional neural networks and have achieved remarkable progress. However, the differences between different feature levels bring difficulties to the fusion process, thus it may lead to unsatisfactory saliency predictions. To address this issue, we propose Active Perception Network (APN) to enhance inter-feature consistency for salient object detection. First, Mutual Projection Module (MPM) is developed to fuse different features, which uses high-level features as guided information to extract complementary components from low-level features, and can suppress background noises and improve semantic consistency. Self Projection Module (SPM) is designed to further refine the fused features, which can be considered as the extended version of residual connection. Features that pass through SPM can produce more accurate saliency maps. Finally, we propose Head Projection Module (HPM) to aggregate global information, which brings strong semantic consistency to the whole network. Comprehensive experiments on five benchmark datasets demonstrate that the proposed method outperforms the state-of-the-art approaches on different evaluation metrics.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134536705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advancements in technology resulted in a sharp growth in the number of digital cameras at people's disposal all across the world. Consequently, the huge storage space consumed by the videos from these devices on video repositories make the job of video processing and analysis to be time-consuming. Furthermore, this also slows down the video browsing and retrieval. Video summarization plays a very crucial role in solving these issues. Despite the number of video summarization approaches proposed up to the present time, the goal is to take a long video and generate a video summary in form of a short video skim without losing the meaning or the message transmitted by the original lengthy video. This is done by selecting the important frames called key-frames. The approach proposed by this work performs automatic summarization of digital videos based on detected objects' deep features. To this end, we apply sparse subspace clustering with an automatically estimated number of clusters to the objects' deep features. The summary generated from our scheme will store the meta-data for each short video inferred from the clustering results. In this paper, we also suggest a new video dataset for video summarization. We evaluate the performance of our work using the TVSum dataset and our video summarization dataset.
{"title":"Video Summarization based on Sparse Subspace Clustering with Automatically Estimated Number of Clusters","authors":"Pengyi Hao, Edwin Manhando, Taotao Ye, Cong Bai","doi":"10.1145/3338533.3366593","DOIUrl":"https://doi.org/10.1145/3338533.3366593","url":null,"abstract":"Advancements in technology resulted in a sharp growth in the number of digital cameras at people's disposal all across the world. Consequently, the huge storage space consumed by the videos from these devices on video repositories make the job of video processing and analysis to be time-consuming. Furthermore, this also slows down the video browsing and retrieval. Video summarization plays a very crucial role in solving these issues. Despite the number of video summarization approaches proposed up to the present time, the goal is to take a long video and generate a video summary in form of a short video skim without losing the meaning or the message transmitted by the original lengthy video. This is done by selecting the important frames called key-frames. The approach proposed by this work performs automatic summarization of digital videos based on detected objects' deep features. To this end, we apply sparse subspace clustering with an automatically estimated number of clusters to the objects' deep features. The summary generated from our scheme will store the meta-data for each short video inferred from the clustering results. In this paper, we also suggest a new video dataset for video summarization. We evaluate the performance of our work using the TVSum dataset and our video summarization dataset.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132442556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Conventional video summarization approaches based on reinforcement learning have the problem that the reward can only be received after the whole summary is generated. Such kind of reward is sparse and it makes reinforcement learning hard to converge. Another problem is that labelling each shot is tedious and costly, which usually prohibits the construction of large-scale datasets. To solve these problems, we propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality. This framework consists of a manager network and a worker network. For each subtask, the manager is trained to set a subgoal only by a task-level binary label, which requires much fewer labels than conventional approaches. With the guide of the subgoal, the worker predicts the importance scores for video shots in the subtask by policy gradient according to both global reward and innovative defined sub-rewards to overcome the sparse problem. Experiments on two benchmark datasets show that our proposal has achieved the best performance, even better than supervised approaches.
{"title":"Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning","authors":"Yiyan Chen, Li Tao, Xueting Wang, T. Yamasaki","doi":"10.1145/3338533.3366583","DOIUrl":"https://doi.org/10.1145/3338533.3366583","url":null,"abstract":"Conventional video summarization approaches based on reinforcement learning have the problem that the reward can only be received after the whole summary is generated. Such kind of reward is sparse and it makes reinforcement learning hard to converge. Another problem is that labelling each shot is tedious and costly, which usually prohibits the construction of large-scale datasets. To solve these problems, we propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality. This framework consists of a manager network and a worker network. For each subtask, the manager is trained to set a subgoal only by a task-level binary label, which requires much fewer labels than conventional approaches. With the guide of the subgoal, the worker predicts the importance scores for video shots in the subtask by policy gradient according to both global reward and innovative defined sub-rewards to overcome the sparse problem. Experiments on two benchmark datasets show that our proposal has achieved the best performance, even better than supervised approaches.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123933017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online dating services have become popular in modern society. Pair matching prediction between two users in these services can help efficiently increase the possibility of finding their life partners. Deep learning based methods with automatic feature interaction functions such as Factorization Machines (FM) and cross network of Deep & Cross Network (DCN) can model sparse categorical features, which are effective to many recommendation tasks of web applications. To solve the partner recommendation task, we improve these FM-based deep models and DCN by enhancing the representation of feature interaction embedding and proposing a novel design of interaction layer avoiding information loss. Through the experiments on two real-world datasets of two online dating companies, we demonstrate the superior performances of our proposed designs.
在线约会服务在现代社会已经变得很流行。在这些服务中,对两个用户之间的配对预测可以有效地提高他们找到生活伴侣的可能性。基于深度学习的具有自动特征交互功能的方法,如Factorization Machines (FM)和cross network of Deep & cross network (DCN),可以对稀疏分类特征进行建模,对web应用的许多推荐任务都是有效的。为了解决伙伴推荐任务,我们通过增强特征交互嵌入的表示,提出了一种避免信息丢失的交互层设计,对这些基于fm的深度模型和DCN进行了改进。通过两家在线约会公司的两个真实数据集的实验,我们证明了我们提出的设计的优越性能。
{"title":"Deep Feature Interaction Embedding for Pair Matching Prediction","authors":"Luwei Zhang, Xueting Wang, T. Yamasaki","doi":"10.1145/3338533.3366597","DOIUrl":"https://doi.org/10.1145/3338533.3366597","url":null,"abstract":"Online dating services have become popular in modern society. Pair matching prediction between two users in these services can help efficiently increase the possibility of finding their life partners. Deep learning based methods with automatic feature interaction functions such as Factorization Machines (FM) and cross network of Deep & Cross Network (DCN) can model sparse categorical features, which are effective to many recommendation tasks of web applications. To solve the partner recommendation task, we improve these FM-based deep models and DCN by enhancing the representation of feature interaction embedding and proposing a novel design of interaction layer avoiding information loss. Through the experiments on two real-world datasets of two online dating companies, we demonstrate the superior performances of our proposed designs.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127678914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}