Motion Based Video Skimming

2020 IEEE Calcutta Conference (CALCON) Pub Date : 2020-02-01 DOI:10.1109/CALCON49167.2020.9106488

I. Alam, Devesh Jalan, Priti Shaw, Partha Pratim Mohanta

{"title":"Motion Based Video Skimming","authors":"I. Alam, Devesh Jalan, Priti Shaw, Partha Pratim Mohanta","doi":"10.1109/CALCON49167.2020.9106488","DOIUrl":null,"url":null,"abstract":"Automatic video summarization is a sustainable method that provides efficient browsing and searching mechanism for long videos. Video skimming is one of the popular ways to represent a summary of a full-length video. This work describes an unsupervised technique that automatically extracts the important clips from an input video and generates a summarized version of that video. The proposed scheme of video skimming is composed of three parts: extraction of motion based features, selection of important clips, detection, and removal of the shot boundary, if any, within a clip. Each frame is represented by a 32-dimensional feature vector that is generated using the slope and magnitude of the motion vectors. A set of representative frames of the entire video is obtained using the k-means clustering followed by the Maximal Spanning Tree (MxST). These representative frames are the center point of the clips to be generated. A window is considered around these representative frames and the clip is formed. A shot boundary may exist within the clip. To detect such a shot boundary, a method is proposed considering the variations present in the pixel intensities of the frames of a clip. The variation among the frames is captured using the standard deviation of the distribution of the pixel intensities. The clips are reformed in case the boundary is detected. Finally, the skim is generated by concatenating extracted video clips in a sequential manner. The obtained video summaries are concise and the proper representation of the input videos. The experiment is performed on two benchmark datasets namely SumMe and TVSum. Experimental results show that the proposed method outperforms the state-of-the-art methods.","PeriodicalId":318478,"journal":{"name":"2020 IEEE Calcutta Conference (CALCON)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Calcutta Conference (CALCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CALCON49167.2020.9106488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Automatic video summarization is a sustainable method that provides efficient browsing and searching mechanism for long videos. Video skimming is one of the popular ways to represent a summary of a full-length video. This work describes an unsupervised technique that automatically extracts the important clips from an input video and generates a summarized version of that video. The proposed scheme of video skimming is composed of three parts: extraction of motion based features, selection of important clips, detection, and removal of the shot boundary, if any, within a clip. Each frame is represented by a 32-dimensional feature vector that is generated using the slope and magnitude of the motion vectors. A set of representative frames of the entire video is obtained using the k-means clustering followed by the Maximal Spanning Tree (MxST). These representative frames are the center point of the clips to be generated. A window is considered around these representative frames and the clip is formed. A shot boundary may exist within the clip. To detect such a shot boundary, a method is proposed considering the variations present in the pixel intensities of the frames of a clip. The variation among the frames is captured using the standard deviation of the distribution of the pixel intensities. The clips are reformed in case the boundary is detected. Finally, the skim is generated by concatenating extracted video clips in a sequential manner. The obtained video summaries are concise and the proper representation of the input videos. The experiment is performed on two benchmark datasets namely SumMe and TVSum. Experimental results show that the proposed method outperforms the state-of-the-art methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于动作的视频浏览

自动视频摘要是一种可持续的方法，为长视频提供了高效的浏览和搜索机制。视频略读是一种流行的方式来表示一个完整的视频的摘要。这项工作描述了一种无监督技术，该技术自动从输入视频中提取重要片段，并生成该视频的摘要版本。提出的视频浏览方案由三个部分组成:基于运动特征的提取、重要片段的选择、检测和去除片段内的镜头边界(如果有的话)。每个帧由一个32维特征向量表示，该特征向量是使用运动向量的斜率和幅度生成的。通过k-means聚类和最大生成树(maximum Spanning Tree, MxST)，得到了一组具有代表性的视频帧。这些代表性帧是要生成的剪辑的中心点。在这些有代表性的框架周围考虑一个窗口，并形成剪辑。镜头边界可能存在于剪辑中。为了检测这样的镜头边界，提出了一种考虑到剪辑帧的像素强度变化的方法。使用像素强度分布的标准差捕获帧之间的变化。如果检测到边界，则对剪辑进行重组。最后，通过按顺序将提取的视频片段连接起来生成略读。得到的视频摘要简洁，对输入的视频进行了恰当的表示。实验在SumMe和TVSum两个基准数据集上进行。实验结果表明，该方法优于现有方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE Calcutta Conference (CALCON)

自引率

0.00%

发文量