Video Summarization Via Actionness Ranking

2019 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2019-01-01 DOI:10.1109/WACV.2019.00085

Mohamed Elfeki, A. Borji

{"title":"Video Summarization Via Actionness Ranking","authors":"Mohamed Elfeki, A. Borji","doi":"10.1109/WACV.2019.00085","DOIUrl":null,"url":null,"abstract":"To automatically produce a brief yet expressive summary of a long video, an automatic algorithm should start by resembling the human process of summary generation. Prior work proposed supervised and unsupervised algorithms to train models for learning the underlying behavior of humans by increasing modeling complexity or craft-designing better heuristics to simulate human summary generation process. In this work, we take a different approach by analyzing a major cue that humans exploit for summary generation; the nature and intensity of actions. We empirically observed that a frame is more likely to be included in human-generated summaries if it contains a substantial amount of deliberate motion performed by an agent, which is referred to as actionness. Therefore, we hypothesize that learning to automatically generate summaries involves an implicit knowledge of actionness estimation and ranking. We validate our hypothesis by running a user study that explores the correlation between human-generated summaries and actionness ranks. We also run a consensus and behavioral analysis between human subjects to ensure reliable and consistent results. The analysis exhibits a considerable degree of agreement among subjects within obtained data and verifying our initial hypothesis. Based on the study findings, we develop a method to incorporate actionness data to explicitly regulate a learning algorithm that is trained for summary generation. We assess the performance of our approach on 4 summarization benchmark datasets, and demonstrate an evident advantage compared to state-of-the-art summarization methods.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV.2019.00085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

Abstract

To automatically produce a brief yet expressive summary of a long video, an automatic algorithm should start by resembling the human process of summary generation. Prior work proposed supervised and unsupervised algorithms to train models for learning the underlying behavior of humans by increasing modeling complexity or craft-designing better heuristics to simulate human summary generation process. In this work, we take a different approach by analyzing a major cue that humans exploit for summary generation; the nature and intensity of actions. We empirically observed that a frame is more likely to be included in human-generated summaries if it contains a substantial amount of deliberate motion performed by an agent, which is referred to as actionness. Therefore, we hypothesize that learning to automatically generate summaries involves an implicit knowledge of actionness estimation and ranking. We validate our hypothesis by running a user study that explores the correlation between human-generated summaries and actionness ranks. We also run a consensus and behavioral analysis between human subjects to ensure reliable and consistent results. The analysis exhibits a considerable degree of agreement among subjects within obtained data and verifying our initial hypothesis. Based on the study findings, we develop a method to incorporate actionness data to explicitly regulate a learning algorithm that is trained for summary generation. We assess the performance of our approach on 4 summarization benchmark datasets, and demonstrate an evident advantage compared to state-of-the-art summarization methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过行动排名进行视频总结

为了自动生成一个简短而富有表现力的长视频摘要，一个自动算法应该从类似于人类生成摘要的过程开始。先前的工作提出了监督和无监督算法，通过增加建模复杂性或工艺设计更好的启发式来模拟人类摘要生成过程，来训练模型以学习人类的潜在行为。在这项工作中，我们采取了不同的方法，通过分析人类利用摘要生成的主要线索;行动的性质和强度。我们根据经验观察到，如果一个框架包含大量由代理执行的故意动作(即行动性)，那么它更有可能被包含在人类生成的摘要中。因此，我们假设学习自动生成摘要涉及对行动估计和排序的隐性知识。我们通过运行一项用户研究来验证我们的假设，该研究探索了人工生成的摘要与行动等级之间的相关性。我们还在人类受试者之间进行共识和行为分析，以确保可靠和一致的结果。该分析在获得的数据和验证我们最初的假设中显示出相当程度的一致性。基于研究结果，我们开发了一种方法，将行动性数据纳入明确规范学习算法，该算法被训练用于摘要生成。我们在4个摘要基准数据集上评估了我们的方法的性能，并展示了与最先进的摘要方法相比的明显优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

自引率

0.00%

发文量