{"title":"基于卷积神经网络的视频摘要方法研究","authors":"Ke-xin Zheng, Xiang Chen","doi":"10.1117/12.2639224","DOIUrl":null,"url":null,"abstract":"Short videos on the Internet are growing exponentially, and the number of videos uploaded every day is huge; people also involve a lot of video data in real life. People can retrieve and view all kinds of videos, but it also brings a lot of problems. On the one hand, the accumulation of a large number of videos makes people unable to find the videos they want quickly, and the repeated scenes in the videos will also waste people's time and energy; on the other hand, a large amount of video data also brings enormous pressure to storage. Aiming at the problems of inaccurate selection of key frames and how to select video frame features in existing video summarization models, this paper proposes a multi-feature-based video summarization generation model (DME-VSNet), which extracts multiple features of video frames. Including importance score, image memory strength and image entropy. Aiming at the problem of inaccurate video shot segmentation, this model proposes a video shot segmentation algorithm based on TransNet network, which divides the original video into several short shots through shot boundaries; the model inputs the above three features into the proposed The video frame score is obtained in the MLP architecture, and the key frame is selected by the score to generate a video summary. The effectiveness of the video shot segmentation method based on TransNet network and the overall model based on convolutional neural network is verified by comparative experiments. The experimental results show that the evaluation results of the video summaries generated by the three features are better.","PeriodicalId":336892,"journal":{"name":"Neural Networks, Information and Communication Engineering","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on video summarization method based on convolutional neural network\",\"authors\":\"Ke-xin Zheng, Xiang Chen\",\"doi\":\"10.1117/12.2639224\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Short videos on the Internet are growing exponentially, and the number of videos uploaded every day is huge; people also involve a lot of video data in real life. People can retrieve and view all kinds of videos, but it also brings a lot of problems. On the one hand, the accumulation of a large number of videos makes people unable to find the videos they want quickly, and the repeated scenes in the videos will also waste people's time and energy; on the other hand, a large amount of video data also brings enormous pressure to storage. Aiming at the problems of inaccurate selection of key frames and how to select video frame features in existing video summarization models, this paper proposes a multi-feature-based video summarization generation model (DME-VSNet), which extracts multiple features of video frames. Including importance score, image memory strength and image entropy. Aiming at the problem of inaccurate video shot segmentation, this model proposes a video shot segmentation algorithm based on TransNet network, which divides the original video into several short shots through shot boundaries; the model inputs the above three features into the proposed The video frame score is obtained in the MLP architecture, and the key frame is selected by the score to generate a video summary. The effectiveness of the video shot segmentation method based on TransNet network and the overall model based on convolutional neural network is verified by comparative experiments. The experimental results show that the evaluation results of the video summaries generated by the three features are better.\",\"PeriodicalId\":336892,\"journal\":{\"name\":\"Neural Networks, Information and Communication Engineering\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks, Information and Communication Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2639224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks, Information and Communication Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2639224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research on video summarization method based on convolutional neural network
Short videos on the Internet are growing exponentially, and the number of videos uploaded every day is huge; people also involve a lot of video data in real life. People can retrieve and view all kinds of videos, but it also brings a lot of problems. On the one hand, the accumulation of a large number of videos makes people unable to find the videos they want quickly, and the repeated scenes in the videos will also waste people's time and energy; on the other hand, a large amount of video data also brings enormous pressure to storage. Aiming at the problems of inaccurate selection of key frames and how to select video frame features in existing video summarization models, this paper proposes a multi-feature-based video summarization generation model (DME-VSNet), which extracts multiple features of video frames. Including importance score, image memory strength and image entropy. Aiming at the problem of inaccurate video shot segmentation, this model proposes a video shot segmentation algorithm based on TransNet network, which divides the original video into several short shots through shot boundaries; the model inputs the above three features into the proposed The video frame score is obtained in the MLP architecture, and the key frame is selected by the score to generate a video summary. The effectiveness of the video shot segmentation method based on TransNet network and the overall model based on convolutional neural network is verified by comparative experiments. The experimental results show that the evaluation results of the video summaries generated by the three features are better.