{"title":"基于融合特征和镜头分割的视频摘要","authors":"Xuming Feng, Yaping Zhu, Cheng Yang","doi":"10.1109/IC-NIDC54101.2021.9660579","DOIUrl":null,"url":null,"abstract":"Video summarization is a technique that creates short summaries from original videos while retaining the main representative information. Traditional video summarization models based on deep learning mostly use frames as the basic processing unit, which cannot handle long videos due to hardware limitations. In this paper, we compress the frame-level features into shot-level features using a feature extractor based on Convolutional Neural Network (CNN), which can improve the training accuracy and reduce computation. At the same time, we propose a feature fusion algorithm based on the capsule network, which combines the RGB features and Light Flow features of the video into the deep features with adaptive weights to enhance the original video features. Experiment results on two benchmark datasets (TVsum and SumMe) validate the effectiveness of our method.","PeriodicalId":264468,"journal":{"name":"2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Video Summarization Based on Fusing Features and Shot Segmentation\",\"authors\":\"Xuming Feng, Yaping Zhu, Cheng Yang\",\"doi\":\"10.1109/IC-NIDC54101.2021.9660579\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video summarization is a technique that creates short summaries from original videos while retaining the main representative information. Traditional video summarization models based on deep learning mostly use frames as the basic processing unit, which cannot handle long videos due to hardware limitations. In this paper, we compress the frame-level features into shot-level features using a feature extractor based on Convolutional Neural Network (CNN), which can improve the training accuracy and reduce computation. At the same time, we propose a feature fusion algorithm based on the capsule network, which combines the RGB features and Light Flow features of the video into the deep features with adaptive weights to enhance the original video features. Experiment results on two benchmark datasets (TVsum and SumMe) validate the effectiveness of our method.\",\"PeriodicalId\":264468,\"journal\":{\"name\":\"2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC-NIDC54101.2021.9660579\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC-NIDC54101.2021.9660579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Video Summarization Based on Fusing Features and Shot Segmentation
Video summarization is a technique that creates short summaries from original videos while retaining the main representative information. Traditional video summarization models based on deep learning mostly use frames as the basic processing unit, which cannot handle long videos due to hardware limitations. In this paper, we compress the frame-level features into shot-level features using a feature extractor based on Convolutional Neural Network (CNN), which can improve the training accuracy and reduce computation. At the same time, we propose a feature fusion algorithm based on the capsule network, which combines the RGB features and Light Flow features of the video into the deep features with adaptive weights to enhance the original video features. Experiment results on two benchmark datasets (TVsum and SumMe) validate the effectiveness of our method.