Proceedings of the ACM Multimedia Asia最新文献

英文中文

Multiple Fisheye Camera Tracking via Real-Time Feature Clustering 基于实时特征聚类的多鱼眼相机跟踪

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366581

Chon-Hou Sio, Hong-Han Shuai, Wen-Huang Cheng

Recently, Multi-Target Multi-Camera Tracking (MTMC) makes a breakthrough due to the release of DukeMTMC and show the feasibility of related applications. However, most of the existing MTMC methods focus on the batch methods which attempt to find the global optimal solution from the entire image sequence and thus are not suitable for the real-time applications, e.g., customer tracking in unmanned stores. In this paper, we propose a low-cost online tracking algorithm, namely, Deep Multi-Fisheye-Camera Tracking (DeepMFCT) to identify the customers and locate the corresponding positions from multiple overlapping fisheye cameras. Based on any single camera tracking algorithm (e.g., Deep SORT), our proposed algorithm establishes the correlation between different single camera tracks. Owing to the lack of well-annotated multiple overlapping fisheye cameras dataset, the main challenge of this issue is to efficiently overcome the domain gap problem between normal cameras and fisheye cameras based on existed deep learning based model. To address this challenge, we integrate a single camera tracking algorithm with cross camera clustering including location information that achieves great performance on the unmanned store dataset and Hall dataset. Experimental results show that the proposed algorithm improves the baselines by at least 7% in terms of MOTA on the Hall dataset.

近年来，多目标多相机跟踪(Multi-Target Multi-Camera Tracking, MTMC)技术因DukeMTMC的发布而取得突破性进展，并显示出相关应用的可行性。然而，现有的MTMC方法大多集中在批处理方法上，这些方法试图从整个图像序列中找到全局最优解，因此不适合实时应用，例如无人商店的顾客跟踪。本文提出了一种低成本的在线跟踪算法，即深度多鱼眼相机跟踪(Deep Multi-Fisheye-Camera tracking, DeepMFCT)，从多个重叠的鱼眼相机中识别客户并定位相应的位置。在任何单摄像机跟踪算法(如Deep SORT)的基础上，我们提出的算法建立了不同单摄像机轨迹之间的相关性。由于缺乏良好标注的多个重叠鱼眼相机数据集，该问题的主要挑战是基于现有的基于深度学习的模型有效克服普通相机和鱼眼相机之间的域间隙问题。为了解决这一挑战，我们将单摄像机跟踪算法与包含位置信息的跨摄像机聚类集成在一起，该算法在无人商店数据集和霍尔数据集上取得了很好的性能。实验结果表明，该算法在Hall数据集上的MOTA比基线提高了至少7%。

{"title":"Multiple Fisheye Camera Tracking via Real-Time Feature Clustering","authors":"Chon-Hou Sio, Hong-Han Shuai, Wen-Huang Cheng","doi":"10.1145/3338533.3366581","DOIUrl":"https://doi.org/10.1145/3338533.3366581","url":null,"abstract":"Recently, Multi-Target Multi-Camera Tracking (MTMC) makes a breakthrough due to the release of DukeMTMC and show the feasibility of related applications. However, most of the existing MTMC methods focus on the batch methods which attempt to find the global optimal solution from the entire image sequence and thus are not suitable for the real-time applications, e.g., customer tracking in unmanned stores. In this paper, we propose a low-cost online tracking algorithm, namely, Deep Multi-Fisheye-Camera Tracking (DeepMFCT) to identify the customers and locate the corresponding positions from multiple overlapping fisheye cameras. Based on any single camera tracking algorithm (e.g., Deep SORT), our proposed algorithm establishes the correlation between different single camera tracks. Owing to the lack of well-annotated multiple overlapping fisheye cameras dataset, the main challenge of this issue is to efficiently overcome the domain gap problem between normal cameras and fisheye cameras based on existed deep learning based model. To address this challenge, we integrate a single camera tracking algorithm with cross camera clustering including location information that achieves great performance on the unmanned store dataset and Hall dataset. Experimental results show that the proposed algorithm improves the baselines by at least 7% in terms of MOTA on the Hall dataset.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126118802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Multimedia Information Retrieval 多媒体信息检索

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3372212

Yangyang Guo

My main research interests include product search and visual question answering (VQA), lying in the field of information retrieval (IR), which aims to obtain information system resources relevant to an information need from a collection. Product search focuses on the E-commerce domain and aims to retrieve products which are not only relevant to the submitted queries but also fit users' personal preferences; Visual question answering aims to provide a natural language answer for a given image and a free-form, open-ended, natural-language question about this image, which requires semantic understanding on natural language and visual content, as well as knowledge extraction and logic reasoning.

我的主要研究兴趣是产品搜索和视觉问答(VQA)，属于信息检索(IR)领域，旨在从一个集合中获取与信息需求相关的信息系统资源。产品搜索侧重于电子商务领域，旨在检索与提交的查询相关且符合用户个人偏好的产品;视觉问答旨在为给定图像提供自然语言答案和关于该图像的自由形式、开放式的自然语言问题，这需要对自然语言和视觉内容的语义理解，以及知识提取和逻辑推理。

引用次数: 0

Deep Distillation Metric Learning 深度蒸馏度量学习

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366560

Jiaxu Han, Tianyu Zhao, Changqing Zhang

Due to the emergence of large-scale and high-dimensional data, measuring the similarity between data points becomes challenging. In order to obtain effective representations, metric learning has become one of the most active researches in the field of computer vision and pattern recognition. However, models using trained networks for predictions are often cumbersome and difficult to be deployed. Therefore, in this paper, we propose a novel deep distillation metric learning (DDML) for online teaching in the procedure of learning the distance metric. Specifically, we employ model distillation to transfer the knowledge acquired by the larger model to the smaller model. Unlike the 2-step offline and mutual online manners, we propose to train a powerful teacher model, who transfer the knowledge to a lightweight and generalizable student model and iteratively improved by the feedback from the student model. We show that our method has achieved state-of-the-art results on CUB200-2011 and CARS196 while having advantages in computational efficiency.

由于大规模和高维数据的出现，测量数据点之间的相似性变得具有挑战性。为了获得有效的表征，度量学习已成为计算机视觉和模式识别领域最活跃的研究之一。然而，使用经过训练的网络进行预测的模型通常很麻烦，而且很难部署。因此，在本文中，我们提出了一种新的深度蒸馏度量学习(DDML)，用于在线教学中距离度量的学习过程。具体来说，我们使用模型蒸馏将大模型获得的知识转移到小模型中。不同于线下两步、线上互动的方式，我们建议培养一个强大的教师模型，将知识传递给一个轻量级的、可推广的学生模型，并根据学生模型的反馈进行迭代改进。我们表明，我们的方法在CUB200-2011和CARS196上取得了最先进的结果，同时在计算效率上具有优势。

引用次数: 4

A Performance-Aware Selection Strategy for Cloud-based Video Services with Micro-Service Architecture 基于微服务架构的云视频服务性能感知选择策略

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366609

Zhengjun Xu, Haitao Zhang, Han Huang

The cloud micro-service architecture provides loosely coupling services and efficient virtual resources, which becomes a promising solution for large-scale video services. It is difficult to efficiently select the optimal services under micro-service architecture, because the large number of micro-services leads to an exponential increase in the number of service selection candidate solutions. In addition, the time sensitivity of video services increases the complexity of service selection, and the video data can affects the service selection results. However, the current video service selection strategies are insufficient under micro-service architecture, because they do not take into account the resource fluctuation of the service instances and the features of the video service comprehensively. In this paper, we focus on the video service selection strategy under micro-service architecture. Firstly, we propose a QoS Prediction (QP) method using explicit factor analysis and linear regression. The QP can accurately predict the QoS values based on the features of video data and service instances. Secondly, we propose a Performance-Aware Video Service Selection (PVSS) method. We prune the candidate services to reduce computational complexity and then efficiently select the optimal solution based on Fruit Fly Optimization (FFO) algorithm. Finally, we conduct extensive experiments to evaluate our strategy, and the results demonstrate the effectiveness of our strategy.

云微服务架构提供了松散耦合的服务和高效的虚拟资源，成为大规模视频业务的解决方案。在微服务架构下，由于大量的微服务导致服务选择候选方案的数量呈指数级增长，因此难以有效地选择最优服务。此外，视频业务的时间敏感性增加了业务选择的复杂性，视频数据会影响业务选择的结果。然而，在微服务架构下，现有的视频服务选择策略没有全面考虑服务实例的资源波动和视频服务的特点，存在一定的不足。本文主要研究了微服务架构下的视频服务选择策略。首先，我们提出了一种基于显式因子分析和线性回归的QoS预测方法。QP可以根据视频数据和业务实例的特点准确预测QoS值。其次，提出了一种性能感知视频服务选择(PVSS)方法。为了降低计算复杂度，我们对候选服务进行剪接，然后基于果蝇优化算法(FFO)高效地选择最优解。最后，我们进行了大量的实验来评估我们的策略，结果证明了我们的策略的有效性。

{"title":"A Performance-Aware Selection Strategy for Cloud-based Video Services with Micro-Service Architecture","authors":"Zhengjun Xu, Haitao Zhang, Han Huang","doi":"10.1145/3338533.3366609","DOIUrl":"https://doi.org/10.1145/3338533.3366609","url":null,"abstract":"The cloud micro-service architecture provides loosely coupling services and efficient virtual resources, which becomes a promising solution for large-scale video services. It is difficult to efficiently select the optimal services under micro-service architecture, because the large number of micro-services leads to an exponential increase in the number of service selection candidate solutions. In addition, the time sensitivity of video services increases the complexity of service selection, and the video data can affects the service selection results. However, the current video service selection strategies are insufficient under micro-service architecture, because they do not take into account the resource fluctuation of the service instances and the features of the video service comprehensively. In this paper, we focus on the video service selection strategy under micro-service architecture. Firstly, we propose a QoS Prediction (QP) method using explicit factor analysis and linear regression. The QP can accurately predict the QoS values based on the features of video data and service instances. Secondly, we propose a Performance-Aware Video Service Selection (PVSS) method. We prune the candidate services to reduce computational complexity and then efficiently select the optimal solution based on Fruit Fly Optimization (FFO) algorithm. Finally, we conduct extensive experiments to evaluate our strategy, and the results demonstrate the effectiveness of our strategy.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126988022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Selective Attention Network for Image Dehazing and Deraining 图像去雾和去雾的选择性注意网络

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366688

Xiao Liang, Runde Li, Jinhui Tang

Image dehazing and deraining are import low-level compute vision tasks. In this paper, we propose a novel method named Selective Attention Network (SAN) to solve these two problems. Due to the density of haze and directions of rain streaks are complex and non-uniform, SAN adopts the channel-wise attention and spatial-channel attention to remove rain streaks and haze both in globally and locally. To better capture various of rain and hazy details, we propose a Selective Attention Module(SAM) to re-scale the channel-wise attention and spatial-channel attention instead of simple element-wise summation. In addition, we conduct ablation studies to validate the effectiveness of the each module of SAN. Extensive experimental results on synthetic and real-world datasets show that SAN performs favorably against state-of-the-art methods.

图像去雾和去噪是重要的底层计算视觉任务。本文提出了一种新的方法——选择性注意网络(SAN)来解决这两个问题。由于雾霾的密度和雨条的方向复杂且不均匀，SAN采用通道关注和空间通道关注，从全局和局部两方面去除雨条和雾霾。为了更好地捕捉降雨和雾霾的各种细节，我们提出了一个选择性注意模块(SAM)来重新缩放通道注意和空间通道注意，而不是简单的元素注意求和。此外，我们还进行了消融研究，以验证SAN的每个模块的有效性。在合成和真实世界数据集上的广泛实验结果表明，SAN优于最先进的方法。

引用次数: 8

Video Summarization based on Sparse Subspace Clustering with Automatically Estimated Number of Clusters 基于自动估计簇数的稀疏子空间聚类视频摘要

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366593

Pengyi Hao, Edwin Manhando, Taotao Ye, Cong Bai

Advancements in technology resulted in a sharp growth in the number of digital cameras at people's disposal all across the world. Consequently, the huge storage space consumed by the videos from these devices on video repositories make the job of video processing and analysis to be time-consuming. Furthermore, this also slows down the video browsing and retrieval. Video summarization plays a very crucial role in solving these issues. Despite the number of video summarization approaches proposed up to the present time, the goal is to take a long video and generate a video summary in form of a short video skim without losing the meaning or the message transmitted by the original lengthy video. This is done by selecting the important frames called key-frames. The approach proposed by this work performs automatic summarization of digital videos based on detected objects' deep features. To this end, we apply sparse subspace clustering with an automatically estimated number of clusters to the objects' deep features. The summary generated from our scheme will store the meta-data for each short video inferred from the clustering results. In this paper, we also suggest a new video dataset for video summarization. We evaluate the performance of our work using the TVSum dataset and our video summarization dataset.

科技的进步导致世界各地人们使用的数码相机数量急剧增加。因此，来自这些设备的视频在视频存储库中占用了巨大的存储空间，使得视频处理和分析工作非常耗时。此外，这也减慢了视频的浏览和检索速度。视频摘要在解决这些问题中起着至关重要的作用。尽管目前提出了许多视频摘要方法，但其目标是在不失去原长视频的意义或传递的信息的情况下，以短视频略读的形式生成视频摘要。这是通过选择称为关键帧的重要帧来完成的。本文提出的方法基于被检测对象的深度特征对数字视频进行自动摘要。为此，我们对对象的深度特征应用稀疏子空间聚类，并自动估计聚类的数量。从我们的方案生成的摘要将存储从聚类结果推断出的每个短视频的元数据。在本文中，我们还提出了一个新的视频数据集用于视频摘要。我们使用TVSum数据集和我们的视频摘要数据集来评估我们的工作性能。

{"title":"Video Summarization based on Sparse Subspace Clustering with Automatically Estimated Number of Clusters","authors":"Pengyi Hao, Edwin Manhando, Taotao Ye, Cong Bai","doi":"10.1145/3338533.3366593","DOIUrl":"https://doi.org/10.1145/3338533.3366593","url":null,"abstract":"Advancements in technology resulted in a sharp growth in the number of digital cameras at people's disposal all across the world. Consequently, the huge storage space consumed by the videos from these devices on video repositories make the job of video processing and analysis to be time-consuming. Furthermore, this also slows down the video browsing and retrieval. Video summarization plays a very crucial role in solving these issues. Despite the number of video summarization approaches proposed up to the present time, the goal is to take a long video and generate a video summary in form of a short video skim without losing the meaning or the message transmitted by the original lengthy video. This is done by selecting the important frames called key-frames. The approach proposed by this work performs automatic summarization of digital videos based on detected objects' deep features. To this end, we apply sparse subspace clustering with an automatically estimated number of clusters to the objects' deep features. The summary generated from our scheme will store the meta-data for each short video inferred from the clustering results. In this paper, we also suggest a new video dataset for video summarization. We evaluate the performance of our work using the TVSum dataset and our video summarization dataset.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132442556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Active Perception Network for Salient Object Detection 显著目标检测的主动感知网络

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366580

Junhang Wei, Shuhui Wang, Liang Li, Qingming Huang

To get better saliency maps for salient object detection, recent methods fuse features from different levels of convolutional neural networks and have achieved remarkable progress. However, the differences between different feature levels bring difficulties to the fusion process, thus it may lead to unsatisfactory saliency predictions. To address this issue, we propose Active Perception Network (APN) to enhance inter-feature consistency for salient object detection. First, Mutual Projection Module (MPM) is developed to fuse different features, which uses high-level features as guided information to extract complementary components from low-level features, and can suppress background noises and improve semantic consistency. Self Projection Module (SPM) is designed to further refine the fused features, which can be considered as the extended version of residual connection. Features that pass through SPM can produce more accurate saliency maps. Finally, we propose Head Projection Module (HPM) to aggregate global information, which brings strong semantic consistency to the whole network. Comprehensive experiments on five benchmark datasets demonstrate that the proposed method outperforms the state-of-the-art approaches on different evaluation metrics.

为了获得更好的显著性图用于显著性目标检测，最近的方法融合了来自不同层次卷积神经网络的特征，并取得了显着进展。然而，不同特征层次之间的差异给融合过程带来了困难，从而可能导致显著性预测不理想。为了解决这个问题，我们提出了主动感知网络(APN)来增强显著目标检测的特征间一致性。首先，提出了融合不同特征的互投影模块(Mutual Projection Module, MPM)，利用高层特征作为引导信息，从低层特征中提取互补成分，可以抑制背景噪声，提高语义一致性;自投影模块(Self Projection Module, SPM)是为了进一步细化融合特征而设计的，它可以看作是残余连接的扩展版本。通过SPM的特征可以生成更精确的显著性图。最后，我们提出了头部投影模块(Head Projection Module, HPM)来聚合全局信息，使整个网络具有较强的语义一致性。在五个基准数据集上的综合实验表明，该方法在不同的评估指标上优于最先进的方法。

{"title":"Active Perception Network for Salient Object Detection","authors":"Junhang Wei, Shuhui Wang, Liang Li, Qingming Huang","doi":"10.1145/3338533.3366580","DOIUrl":"https://doi.org/10.1145/3338533.3366580","url":null,"abstract":"To get better saliency maps for salient object detection, recent methods fuse features from different levels of convolutional neural networks and have achieved remarkable progress. However, the differences between different feature levels bring difficulties to the fusion process, thus it may lead to unsatisfactory saliency predictions. To address this issue, we propose Active Perception Network (APN) to enhance inter-feature consistency for salient object detection. First, Mutual Projection Module (MPM) is developed to fuse different features, which uses high-level features as guided information to extract complementary components from low-level features, and can suppress background noises and improve semantic consistency. Self Projection Module (SPM) is designed to further refine the fused features, which can be considered as the extended version of residual connection. Features that pass through SPM can produce more accurate saliency maps. Finally, we propose Head Projection Module (HPM) to aggregate global information, which brings strong semantic consistency to the whole network. Comprehensive experiments on five benchmark datasets demonstrate that the proposed method outperforms the state-of-the-art approaches on different evaluation metrics.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134536705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust Visual Tracking via Statistical Positive Sample Generation and Gradient Aware Learning 基于统计正样本生成和梯度感知学习的鲁棒视觉跟踪

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366556

Lijian Lin, Haosheng Chen, Yanjie Liang, Y. Yan, Hanzi Wang

In recent years, Convolutional Neural Network (CNN) based trackers have achieved state-of-the-art performance on multiple benchmark datasets. Most of these trackers train a binary classifier to distinguish the target from its background. However, they suffer from two limitations. Firstly, these trackers cannot effectively handle significant appearance variations due to the limited number of positive samples. Secondly, there exists a significant imbalance of gradient contributions between easy and hard samples, where the easy samples usually dominate the computation of gradient. In this paper, we propose a robust tracking method via Statistical Positive sample generation and Gradient Aware learning (SPGA) to address the above two limitations. To enrich the diversity of positive samples, we present an effective and efficient statistical positive sample generation algorithm to generate positive samples in the feature space. Furthermore, to handle the issue of imbalance between easy and hard samples, we propose a gradient sensitive loss to harmonize the gradient contributions between easy and hard samples. Extensive experiments on three challenging benchmark datasets including OTB50, OTB100 and VOT2016 demonstrate that the proposed SPGA performs favorably against several state-of-the-art trackers.

近年来，基于卷积神经网络(CNN)的跟踪器在多个基准数据集上取得了最先进的性能。这些跟踪器大多训练一个二值分类器来区分目标和背景。然而，它们受到两个限制。首先，由于阳性样本数量有限，这些跟踪器无法有效处理显著的外观变化。其次，易样本和硬样本之间的梯度贡献存在明显的不平衡，易样本通常在梯度计算中占主导地位。在本文中，我们提出了一种通过统计正样本生成和梯度感知学习(SPGA)的鲁棒跟踪方法来解决上述两个限制。为了丰富正样本的多样性，我们提出了一种有效的统计正样本生成算法，在特征空间中生成正样本。此外，为了解决易、硬样本之间的不平衡问题，我们提出了一个梯度敏感损失来协调易、硬样本之间的梯度贡献。在包括OTB50、OTB100和VOT2016在内的三个具有挑战性的基准数据集上进行的大量实验表明，所提出的SPGA在几种最先进的跟踪器中表现良好。

{"title":"Robust Visual Tracking via Statistical Positive Sample Generation and Gradient Aware Learning","authors":"Lijian Lin, Haosheng Chen, Yanjie Liang, Y. Yan, Hanzi Wang","doi":"10.1145/3338533.3366556","DOIUrl":"https://doi.org/10.1145/3338533.3366556","url":null,"abstract":"In recent years, Convolutional Neural Network (CNN) based trackers have achieved state-of-the-art performance on multiple benchmark datasets. Most of these trackers train a binary classifier to distinguish the target from its background. However, they suffer from two limitations. Firstly, these trackers cannot effectively handle significant appearance variations due to the limited number of positive samples. Secondly, there exists a significant imbalance of gradient contributions between easy and hard samples, where the easy samples usually dominate the computation of gradient. In this paper, we propose a robust tracking method via Statistical Positive sample generation and Gradient Aware learning (SPGA) to address the above two limitations. To enrich the diversity of positive samples, we present an effective and efficient statistical positive sample generation algorithm to generate positive samples in the feature space. Furthermore, to handle the issue of imbalance between easy and hard samples, we propose a gradient sensitive loss to harmonize the gradient contributions between easy and hard samples. Extensive experiments on three challenging benchmark datasets including OTB50, OTB100 and VOT2016 demonstrate that the proposed SPGA performs favorably against several state-of-the-art trackers.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130307879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning 基于层次强化学习的弱监督视频摘要

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366583

Yiyan Chen, Li Tao, Xueting Wang, T. Yamasaki

Conventional video summarization approaches based on reinforcement learning have the problem that the reward can only be received after the whole summary is generated. Such kind of reward is sparse and it makes reinforcement learning hard to converge. Another problem is that labelling each shot is tedious and costly, which usually prohibits the construction of large-scale datasets. To solve these problems, we propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality. This framework consists of a manager network and a worker network. For each subtask, the manager is trained to set a subgoal only by a task-level binary label, which requires much fewer labels than conventional approaches. With the guide of the subgoal, the worker predicts the importance scores for video shots in the subtask by policy gradient according to both global reward and innovative defined sub-rewards to overcome the sparse problem. Experiments on two benchmark datasets show that our proposal has achieved the best performance, even better than supervised approaches.

传统的基于强化学习的视频摘要方法存在一个问题，即只有在生成完整的摘要后才能收到奖励。这种奖励是稀疏的，这使得强化学习很难收敛。另一个问题是，给每个镜头贴上标签既繁琐又昂贵，这通常阻碍了大规模数据集的构建。为了解决这些问题，我们提出了一个弱监督分层强化学习框架，该框架将整个任务分解为几个子任务，以提高总结质量。该框架由管理者网络和工作人员网络组成。对于每个子任务，管理人员被训练为仅通过任务级二进制标签来设置子目标，这比传统方法需要的标签少得多。在子目标的指导下，根据全局奖励和创新定义的子奖励，通过策略梯度预测子任务中视频镜头的重要性分数，以克服稀疏问题。在两个基准数据集上的实验表明，我们的方法取得了最好的性能，甚至优于监督方法。

{"title":"Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning","authors":"Yiyan Chen, Li Tao, Xueting Wang, T. Yamasaki","doi":"10.1145/3338533.3366583","DOIUrl":"https://doi.org/10.1145/3338533.3366583","url":null,"abstract":"Conventional video summarization approaches based on reinforcement learning have the problem that the reward can only be received after the whole summary is generated. Such kind of reward is sparse and it makes reinforcement learning hard to converge. Another problem is that labelling each shot is tedious and costly, which usually prohibits the construction of large-scale datasets. To solve these problems, we propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality. This framework consists of a manager network and a worker network. For each subtask, the manager is trained to set a subgoal only by a task-level binary label, which requires much fewer labels than conventional approaches. With the guide of the subgoal, the worker predicts the importance scores for video shots in the subtask by policy gradient according to both global reward and innovative defined sub-rewards to overcome the sparse problem. Experiments on two benchmark datasets show that our proposal has achieved the best performance, even better than supervised approaches.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123933017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

Session details: Poster Session 会议详情:海报会议

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3379191

Ting Gan

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the ACM Multimedia Asia

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀