基于全局时空信息编码器-解码器的无剪辑视频中的动作分割

IF 6.6 1区 计算机科学 Q1 Multidisciplinary Tsinghua Science and Technology Pub Date : 2024-09-11 DOI:10.26599/TST.2024.9010041
Yichao Liu;Yiyang Sun;Zhide Chen;Chen Feng;Kexin Zhu
{"title":"基于全局时空信息编码器-解码器的无剪辑视频中的动作分割","authors":"Yichao Liu;Yiyang Sun;Zhide Chen;Chen Feng;Kexin Zhu","doi":"10.26599/TST.2024.9010041","DOIUrl":null,"url":null,"abstract":"Action segmentation has made significant progress, but segmenting and recognizing actions from untrimmed long videos remains a challenging problem. Most state-of-the-art methods focus on designing models based on temporal convolution. However, the limitations of modeling long-term temporal dependencies and the inflexibility of temporal convolutions restrict the potential of these models. To address the issue of over-segmentation in existing action segmentation methods, which leads to classification errors and reduced segmentation quality, this paper proposes a global spatial-temporal information encoder-decoder based action segmentation method. The method proposed in this paper uses the global temporal information captured by refinement layer to assist the Encoder-Decoder (ED) structure in judging the action segmentation point more accurately and, at the same time, suppress the excessive segmentation phenomenon caused by the ED structure. The method proposed in this paper achieves 93% frame accuracy on the constructed real Tai Chi action dataset. The experimental results prove that this method can accurately and efficiently complete the long video action segmentation task.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 1","pages":"290-302"},"PeriodicalIF":6.6000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10676351","citationCount":"0","resultStr":"{\"title\":\"Global Spatial-Temporal Information Encoder-Decoder Based Action Segmentation in Untrimmed Video\",\"authors\":\"Yichao Liu;Yiyang Sun;Zhide Chen;Chen Feng;Kexin Zhu\",\"doi\":\"10.26599/TST.2024.9010041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Action segmentation has made significant progress, but segmenting and recognizing actions from untrimmed long videos remains a challenging problem. Most state-of-the-art methods focus on designing models based on temporal convolution. However, the limitations of modeling long-term temporal dependencies and the inflexibility of temporal convolutions restrict the potential of these models. To address the issue of over-segmentation in existing action segmentation methods, which leads to classification errors and reduced segmentation quality, this paper proposes a global spatial-temporal information encoder-decoder based action segmentation method. The method proposed in this paper uses the global temporal information captured by refinement layer to assist the Encoder-Decoder (ED) structure in judging the action segmentation point more accurately and, at the same time, suppress the excessive segmentation phenomenon caused by the ED structure. The method proposed in this paper achieves 93% frame accuracy on the constructed real Tai Chi action dataset. The experimental results prove that this method can accurately and efficiently complete the long video action segmentation task.\",\"PeriodicalId\":48690,\"journal\":{\"name\":\"Tsinghua Science and Technology\",\"volume\":\"30 1\",\"pages\":\"290-302\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10676351\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Tsinghua Science and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10676351/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tsinghua Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10676351/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0

摘要

动作分割技术已经取得了重大进展,但从未经剪辑的长视频中分割和识别动作仍然是一个具有挑战性的问题。大多数最先进的方法都侧重于设计基于时态卷积的模型。然而,长期时间依赖性建模的局限性和时间卷积的不灵活性限制了这些模型的潜力。为了解决现有动作分割方法中存在的过度分割问题,即导致分类错误和分割质量下降的问题,本文提出了一种基于全局时空信息编码器-解码器的动作分割方法。本文提出的方法利用细化层捕获的全局时空信息,辅助编码器-解码器(ED)结构更准确地判断动作分割点,同时抑制 ED 结构造成的过度分割现象。本文提出的方法在构建的真实太极拳动作数据集上实现了 93% 的帧准确率。实验结果证明,该方法可以准确高效地完成长视频动作分割任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Global Spatial-Temporal Information Encoder-Decoder Based Action Segmentation in Untrimmed Video
Action segmentation has made significant progress, but segmenting and recognizing actions from untrimmed long videos remains a challenging problem. Most state-of-the-art methods focus on designing models based on temporal convolution. However, the limitations of modeling long-term temporal dependencies and the inflexibility of temporal convolutions restrict the potential of these models. To address the issue of over-segmentation in existing action segmentation methods, which leads to classification errors and reduced segmentation quality, this paper proposes a global spatial-temporal information encoder-decoder based action segmentation method. The method proposed in this paper uses the global temporal information captured by refinement layer to assist the Encoder-Decoder (ED) structure in judging the action segmentation point more accurately and, at the same time, suppress the excessive segmentation phenomenon caused by the ED structure. The method proposed in this paper achieves 93% frame accuracy on the constructed real Tai Chi action dataset. The experimental results prove that this method can accurately and efficiently complete the long video action segmentation task.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Tsinghua Science and Technology
Tsinghua Science and Technology COMPUTER SCIENCE, INFORMATION SYSTEMSCOMPU-COMPUTER SCIENCE, SOFTWARE ENGINEERING
CiteScore
10.20
自引率
10.60%
发文量
2340
期刊介绍: Tsinghua Science and Technology (Tsinghua Sci Technol) started publication in 1996. It is an international academic journal sponsored by Tsinghua University and is published bimonthly. This journal aims at presenting the up-to-date scientific achievements in computer science, electronic engineering, and other IT fields. Contributions all over the world are welcome.
期刊最新文献
Contents Front Cover LP-Rounding Based Algorithm for Capacitated Uniform Facility Location Problem with Soft Penalties A P4-Based Approach to Traffic Isolation and Bandwidth Management for 5G Network Slicing Quantum-Inspired Sensitive Data Measurement and Secure Transmission in 5G-Enabled Healthcare Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1