Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning

Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI:10.1145/3338533.3366583

Yiyan Chen, Li Tao, Xueting Wang, T. Yamasaki

{"title":"Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning","authors":"Yiyan Chen, Li Tao, Xueting Wang, T. Yamasaki","doi":"10.1145/3338533.3366583","DOIUrl":null,"url":null,"abstract":"Conventional video summarization approaches based on reinforcement learning have the problem that the reward can only be received after the whole summary is generated. Such kind of reward is sparse and it makes reinforcement learning hard to converge. Another problem is that labelling each shot is tedious and costly, which usually prohibits the construction of large-scale datasets. To solve these problems, we propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality. This framework consists of a manager network and a worker network. For each subtask, the manager is trained to set a subgoal only by a task-level binary label, which requires much fewer labels than conventional approaches. With the guide of the subgoal, the worker predicts the importance scores for video shots in the subtask by policy gradient according to both global reward and innovative defined sub-rewards to overcome the sparse problem. Experiments on two benchmark datasets show that our proposal has achieved the best performance, even better than supervised approaches.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3338533.3366583","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 39

Abstract

Conventional video summarization approaches based on reinforcement learning have the problem that the reward can only be received after the whole summary is generated. Such kind of reward is sparse and it makes reinforcement learning hard to converge. Another problem is that labelling each shot is tedious and costly, which usually prohibits the construction of large-scale datasets. To solve these problems, we propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality. This framework consists of a manager network and a worker network. For each subtask, the manager is trained to set a subgoal only by a task-level binary label, which requires much fewer labels than conventional approaches. With the guide of the subgoal, the worker predicts the importance scores for video shots in the subtask by policy gradient according to both global reward and innovative defined sub-rewards to overcome the sparse problem. Experiments on two benchmark datasets show that our proposal has achieved the best performance, even better than supervised approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于层次强化学习的弱监督视频摘要

传统的基于强化学习的视频摘要方法存在一个问题，即只有在生成完整的摘要后才能收到奖励。这种奖励是稀疏的，这使得强化学习很难收敛。另一个问题是，给每个镜头贴上标签既繁琐又昂贵，这通常阻碍了大规模数据集的构建。为了解决这些问题，我们提出了一个弱监督分层强化学习框架，该框架将整个任务分解为几个子任务，以提高总结质量。该框架由管理者网络和工作人员网络组成。对于每个子任务，管理人员被训练为仅通过任务级二进制标签来设置子目标，这比传统方法需要的标签少得多。在子目标的指导下，根据全局奖励和创新定义的子奖励，通过策略梯度预测子任务中视频镜头的重要性分数，以克服稀疏问题。在两个基准数据集上的实验表明，我们的方法取得了最好的性能，甚至优于监督方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the ACM Multimedia Asia

自引率

0.00%

发文量

期刊最新文献

Session details: Vision in Multimedia Domain Specific and Idiom Adaptive Video Summarization Multi-Label Image Classification with Attention Mechanism and Graph Convolutional Networks Session details: Brave New Idea Self-balance Motion and Appearance Model for Multi-object Tracking in UAV