Tingting Han, Quan Zhou, Jun Yu, Zhou Yu, Jianhui Zhang, Sicheng Zhao
{"title":"通过提取无参数运动注意力实现有效的视频总结","authors":"Tingting Han, Quan Zhou, Jun Yu, Zhou Yu, Jianhui Zhang, Sicheng Zhao","doi":"10.1145/3654670","DOIUrl":null,"url":null,"abstract":"<p>Video summarization remains a challenging task despite increasing research efforts. Traditional methods focus solely on long-range temporal modeling of video frames, overlooking important local motion information which can not be captured by frame-level video representations. In this paper, we propose the Parameter-free Motion Attention Module (PMAM) to exploit the crucial motion clues potentially contained in adjacent video frames, using a multi-head attention architecture. The PMAM requires no additional training for model parameters, leading to an efficient and effective understanding of video dynamics. Moreover, we introduce the Multi-feature Motion Attention Network (MMAN), integrating the parameter-free motion attention module with local and global multi-head attention based on object-centric and scene-centric video representations. The synergistic combination of local motion information, extracted by the proposed PMAM, with long-range interactions modeled by the local and global multi-head attention mechanism, can significantly enhance the performance of video summarization. Extensive experimental results on the benchmark datasets, SumMe and TVSum, demonstrate that the proposed MMAN outperforms other state-of-the-art methods, resulting in remarkable performance gains.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"2015 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Effective Video Summarization by Extracting Parameter-free Motion Attention\",\"authors\":\"Tingting Han, Quan Zhou, Jun Yu, Zhou Yu, Jianhui Zhang, Sicheng Zhao\",\"doi\":\"10.1145/3654670\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Video summarization remains a challenging task despite increasing research efforts. Traditional methods focus solely on long-range temporal modeling of video frames, overlooking important local motion information which can not be captured by frame-level video representations. In this paper, we propose the Parameter-free Motion Attention Module (PMAM) to exploit the crucial motion clues potentially contained in adjacent video frames, using a multi-head attention architecture. The PMAM requires no additional training for model parameters, leading to an efficient and effective understanding of video dynamics. Moreover, we introduce the Multi-feature Motion Attention Network (MMAN), integrating the parameter-free motion attention module with local and global multi-head attention based on object-centric and scene-centric video representations. The synergistic combination of local motion information, extracted by the proposed PMAM, with long-range interactions modeled by the local and global multi-head attention mechanism, can significantly enhance the performance of video summarization. Extensive experimental results on the benchmark datasets, SumMe and TVSum, demonstrate that the proposed MMAN outperforms other state-of-the-art methods, resulting in remarkable performance gains.</p>\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"2015 1\",\"pages\":\"\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3654670\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3654670","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Effective Video Summarization by Extracting Parameter-free Motion Attention
Video summarization remains a challenging task despite increasing research efforts. Traditional methods focus solely on long-range temporal modeling of video frames, overlooking important local motion information which can not be captured by frame-level video representations. In this paper, we propose the Parameter-free Motion Attention Module (PMAM) to exploit the crucial motion clues potentially contained in adjacent video frames, using a multi-head attention architecture. The PMAM requires no additional training for model parameters, leading to an efficient and effective understanding of video dynamics. Moreover, we introduce the Multi-feature Motion Attention Network (MMAN), integrating the parameter-free motion attention module with local and global multi-head attention based on object-centric and scene-centric video representations. The synergistic combination of local motion information, extracted by the proposed PMAM, with long-range interactions modeled by the local and global multi-head attention mechanism, can significantly enhance the performance of video summarization. Extensive experimental results on the benchmark datasets, SumMe and TVSum, demonstrate that the proposed MMAN outperforms other state-of-the-art methods, resulting in remarkable performance gains.
期刊介绍:
The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome.
TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.