Realistic Video Summarization through VISIOCITY: A New Benchmark and Evaluation Framework

Vishal Kaushal, S. Kothawade, Rishabh K. Iyer, Ganesh Ramakrishnan
{"title":"Realistic Video Summarization through VISIOCITY: A New Benchmark and Evaluation Framework","authors":"Vishal Kaushal, S. Kothawade, Rishabh K. Iyer, Ganesh Ramakrishnan","doi":"10.1145/3422839.3423064","DOIUrl":null,"url":null,"abstract":"Automatic video summarization is still an unsolved problem due to several challenges. We take steps towards making it more realistic by addressing the following challenges. Firstly, the currently available datasets either have very short videos or have few long videos of only a particular type. We introduce a new benchmarking dataset called VISIOCITY which comprises of longer videos across six different categories with dense concept annotations capable of supporting different flavors of video summarization and other vision problems. Secondly, for long videos, human reference summaries, necessary for supervised video summarization techniques, are difficult to obtain. We present a novel recipe based on pareto optimality to automatically generate multiple reference summaries from indirect ground truth present in VISIOCITY. We show that these summaries are at par with human summaries. Thirdly, we demonstrate that in the presence of multiple ground truth summaries (due to the highly subjective nature of the task), learning from a single combined ground truth summary using a single loss function is not a good idea. We propose a simple recipe VISIOCITY-SUM to enhance an existing model using a combination of losses and demonstrate that it beats the current state of the art techniques. We also present a study of different desired characteristics of a good summary and demonstrate that a single measure (say F1) to evaluate a summary, as is the current typical practice, falls short in some ways. We propose an evaluation framework for better quantitative assessment of summary quality which is closer to human judgment than a single measure. We report the performance of a few representative techniques of video summarization on VISIOCITY assessed using various measures and bring out the limitation of the techniques and/or the assessment mechanism in modeling human judgment and demonstrate the effectiveness of our evaluation framework in doing so.","PeriodicalId":270338,"journal":{"name":"Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3422839.3423064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Automatic video summarization is still an unsolved problem due to several challenges. We take steps towards making it more realistic by addressing the following challenges. Firstly, the currently available datasets either have very short videos or have few long videos of only a particular type. We introduce a new benchmarking dataset called VISIOCITY which comprises of longer videos across six different categories with dense concept annotations capable of supporting different flavors of video summarization and other vision problems. Secondly, for long videos, human reference summaries, necessary for supervised video summarization techniques, are difficult to obtain. We present a novel recipe based on pareto optimality to automatically generate multiple reference summaries from indirect ground truth present in VISIOCITY. We show that these summaries are at par with human summaries. Thirdly, we demonstrate that in the presence of multiple ground truth summaries (due to the highly subjective nature of the task), learning from a single combined ground truth summary using a single loss function is not a good idea. We propose a simple recipe VISIOCITY-SUM to enhance an existing model using a combination of losses and demonstrate that it beats the current state of the art techniques. We also present a study of different desired characteristics of a good summary and demonstrate that a single measure (say F1) to evaluate a summary, as is the current typical practice, falls short in some ways. We propose an evaluation framework for better quantitative assessment of summary quality which is closer to human judgment than a single measure. We report the performance of a few representative techniques of video summarization on VISIOCITY assessed using various measures and bring out the limitation of the techniques and/or the assessment mechanism in modeling human judgment and demonstrate the effectiveness of our evaluation framework in doing so.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过VISIOCITY进行现实视频总结:一个新的基准和评估框架
由于一些挑战,自动视频摘要仍然是一个未解决的问题。我们采取措施,通过应对以下挑战,使这一目标更加现实。首先,目前可用的数据集要么有非常短的视频,要么只有少数特定类型的长视频。我们引入了一个名为VISIOCITY的新基准数据集,该数据集包含六个不同类别的较长视频,具有密集的概念注释,能够支持不同风格的视频摘要和其他视觉问题。其次,对于长视频,很难获得监督视频摘要技术所必需的人工参考摘要。我们提出了一种基于帕累托最优的新方法,从VISIOCITY中存在的间接地真值自动生成多个参考摘要。我们证明了这些总结与人类的总结不相上下。第三,我们证明了在存在多个基础真值总结的情况下(由于任务的高度主观性),使用单个损失函数从单个组合的基础真值总结中学习并不是一个好主意。我们提出了一个简单的配方VISIOCITY-SUM,使用损失组合来增强现有模型,并证明它优于当前最先进的技术。我们还提出了一项关于优秀摘要的不同期望特征的研究,并证明了评估摘要的单一度量(例如F1)在某些方面存在不足,这是当前的典型做法。我们提出了一个评估框架,更好地定量评估总结质量,更接近于人类的判断,而不是单一的措施。我们报告了几种具有代表性的视频摘要技术在VISIOCITY上的表现,并使用各种测量方法进行了评估,指出了这些技术和/或评估机制在模拟人类判断方面的局限性,并证明了我们的评估框架在这方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery AI in the Media Spotlight Session details: Keynote & Invited Talks Predicting Your Future Audience's Popular Topics to Optimize TV Content Marketing Success Named Entity Recognition for Spoken Finnish
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1