End-to-End Streaming Video Temporal Action Segmentation With Reinforcement Learning

IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE transactions on neural networks and learning systems Pub Date : 2025-04-11 DOI:10.1109/TNNLS.2025.3550910
Jin-Rong Zhang;Wu-Jun Wen;Sheng-Lan Liu;Gao Huang;Yun-Heng Li;Qi-Feng Li;Lin Feng
{"title":"End-to-End Streaming Video Temporal Action Segmentation With Reinforcement Learning","authors":"Jin-Rong Zhang;Wu-Jun Wen;Sheng-Lan Liu;Gao Huang;Yun-Heng Li;Qi-Feng Li;Lin Feng","doi":"10.1109/TNNLS.2025.3550910","DOIUrl":null,"url":null,"abstract":"The streaming temporal action segmentation (STAS) task, a supplementary task of temporal action segmentation (TAS), has not received adequate attention in the field of video understanding. Existing TAS methods are constrained to offline scenarios due to their heavy reliance on multimodal features and complete contextual information. The STAS task requires the model to classify each frame of the entire untrimmed video sequence clip by clip in time, thereby extending the applicability of TAS methods to online scenarios. However, directly applying existing TAS methods to SATS tasks results in significantly poor segmentation outcomes. In this article, we thoroughly analyze the fundamental differences between STAS tasks and TAS tasks, attributing the severe performance degradation when transferring models to model bias and optimization dilemmas. We introduce an end-to-end streaming video TAS model with reinforcement learning (SVTAS-RL). The end-to-end modeling method mitigates the modeling bias introduced by the change in task nature and enhances the feasibility of online solutions. Reinforcement learning (RL) is utilized to alleviate the optimization dilemma. Through extensive experiments, the SVTAS-RL model significantly outperforms existing STAS models and achieves competitive performance to the state-of-the-art (SOTA) TAS model on multiple datasets under the same evaluation criteria, demonstrating notable advantages on the ultralong video dataset EGTEA. Our code is publicly available at <uri>https://github.com/Thinksky5124/SVTAS</uri>.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 8","pages":"15449-15462"},"PeriodicalIF":8.9000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10963907/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The streaming temporal action segmentation (STAS) task, a supplementary task of temporal action segmentation (TAS), has not received adequate attention in the field of video understanding. Existing TAS methods are constrained to offline scenarios due to their heavy reliance on multimodal features and complete contextual information. The STAS task requires the model to classify each frame of the entire untrimmed video sequence clip by clip in time, thereby extending the applicability of TAS methods to online scenarios. However, directly applying existing TAS methods to SATS tasks results in significantly poor segmentation outcomes. In this article, we thoroughly analyze the fundamental differences between STAS tasks and TAS tasks, attributing the severe performance degradation when transferring models to model bias and optimization dilemmas. We introduce an end-to-end streaming video TAS model with reinforcement learning (SVTAS-RL). The end-to-end modeling method mitigates the modeling bias introduced by the change in task nature and enhances the feasibility of online solutions. Reinforcement learning (RL) is utilized to alleviate the optimization dilemma. Through extensive experiments, the SVTAS-RL model significantly outperforms existing STAS models and achieves competitive performance to the state-of-the-art (SOTA) TAS model on multiple datasets under the same evaluation criteria, demonstrating notable advantages on the ultralong video dataset EGTEA. Our code is publicly available at https://github.com/Thinksky5124/SVTAS.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于强化学习的端到端流视频时间动作分割
流式时间动作分割(STAS)任务是时间动作分割(TAS)的补充任务,在视频理解领域还没有得到足够的重视。由于现有的TAS方法严重依赖于多模式特征和完整的上下文信息,因此受限于离线场景。STAS任务要求模型对整个未修剪视频序列的每一帧逐段进行分类,从而将TAS方法的适用性扩展到在线场景。然而,直接将现有的TAS方法应用于SATS任务会导致明显较差的分割结果。在本文中,我们深入分析了STAS任务和TAS任务之间的根本区别,将模型迁移时的严重性能下降归因于模型偏差和优化困境。我们引入了一种端到端强化学习的流视频TAS模型(SVTAS-RL)。端到端建模方法减轻了任务性质变化带来的建模偏差,提高了在线解决方案的可行性。利用强化学习(RL)来缓解优化困境。通过大量的实验,SVTAS-RL模型显著优于现有的STAS模型,在相同的评估标准下,在多个数据集上取得了与最先进的(SOTA) TAS模型相媲美的性能,在超长视频数据集EGTEA上显示出显著的优势。我们的代码可以在https://github.com/Thinksky5124/SVTAS上公开获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE transactions on neural networks and learning systems
IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
CiteScore
23.80
自引率
9.60%
发文量
2102
审稿时长
3-8 weeks
期刊介绍: The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.
期刊最新文献
Graph Transformers: A Survey Multisynchronization of Delayed Coupled Neural Networks With General Activation Functions via Impulsive Control Anomaly Subgraph Detection on Multiple Associated Attributed Networks ScaDyG: A New Paradigm for Large-Scale Dynamic Graph Learning CPFformer: A Hierarchical-Based Graph Modeling Fusion Framework for Making the Emotional Features of Chinese Poetry Pronunciation More Controllable
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1