Not all samples are equal: Boosting action segmentation via selective incremental learning

IF 8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Engineering Applications of Artificial Intelligence Pub Date : 2025-02-26 DOI:10.1016/j.engappai.2025.110334

Feng Huang , Xiao-Diao Chen , Wen Wu , Weiyin Ma

{"title":"Not all samples are equal: Boosting action segmentation via selective incremental learning","authors":"Feng Huang , Xiao-Diao Chen , Wen Wu , Weiyin Ma","doi":"10.1016/j.engappai.2025.110334","DOIUrl":null,"url":null,"abstract":"<div><div>Temporal action segmentation (TAS) seeks to perform classification for each frame in a video. Existing methods tend to design diverse network architectures, while overlooking the intrinsic characteristics of training samples. Notably, two key issues arise: (1) Frames around action boundaries are more ambiguous and thus pose greater difficulties for training compared to other frames; and (2) beyond the commonly used categorical labels, the total number of action instances within a video may serve as an additional, potentially vital, supervision cue. To address these issues, this paper introduces a novel method that combines a model-agnostic training strategy with an instance number alignment loss, designed to enhance the performance of existing models. Specifically, a selective incremental learning (SIL) strategy is proposed to alleviate the impact of noisy samples by progressively training the model in an easy-to-difficult manner through a dynamic sample selection mechanism. Furthermore, an instance number alignment loss (INAL) is developed to capture both global and local features simultaneously by incorporating a multi-task learning module. Extensive evaluations are conducted on three benchmark datasets, namely 50Salads, Georgia Tech egocentric activities (GTEA), and Breakfast. The experimental results demonstrate that the proposed method achieves substantial performance improvements over state-of-the-art approaches.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"147 ","pages":"Article 110334"},"PeriodicalIF":8.0000,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625003343","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Temporal action segmentation (TAS) seeks to perform classification for each frame in a video. Existing methods tend to design diverse network architectures, while overlooking the intrinsic characteristics of training samples. Notably, two key issues arise: (1) Frames around action boundaries are more ambiguous and thus pose greater difficulties for training compared to other frames; and (2) beyond the commonly used categorical labels, the total number of action instances within a video may serve as an additional, potentially vital, supervision cue. To address these issues, this paper introduces a novel method that combines a model-agnostic training strategy with an instance number alignment loss, designed to enhance the performance of existing models. Specifically, a selective incremental learning (SIL) strategy is proposed to alleviate the impact of noisy samples by progressively training the model in an easy-to-difficult manner through a dynamic sample selection mechanism. Furthermore, an instance number alignment loss (INAL) is developed to capture both global and local features simultaneously by incorporating a multi-task learning module. Extensive evaluations are conducted on three benchmark datasets, namely 50Salads, Georgia Tech egocentric activities (GTEA), and Breakfast. The experimental results demonstrate that the proposed method achieves substantial performance improvements over state-of-the-art approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

并非所有样本都是相同的：通过选择性增量学习来促进动作分割

时间动作分割（TAS）试图对视频中的每一帧进行分类。现有的方法倾向于设计多样化的网络架构，而忽略了训练样本的内在特征。值得注意的是，出现了两个关键问题：(1)围绕动作边界的帧更加模糊，因此与其他帧相比，对训练造成了更大的困难；(2)除了常用的分类标签之外，视频中动作实例的总数可以作为额外的、潜在的重要监督线索。为了解决这些问题，本文引入了一种新的方法，该方法将模型不可知的训练策略与实例数对齐损失相结合，旨在提高现有模型的性能。具体而言，提出了一种选择性增量学习（SIL）策略，通过动态样本选择机制，以易难的方式逐步训练模型，以减轻噪声样本的影响。此外，通过结合多任务学习模块，开发了实例数对齐损失（INAL）算法来同时捕获全局和局部特征。在三个基准数据集上进行了广泛的评估，即50salad， Georgia Tech egocentric activities （GTEA）和Breakfast。实验结果表明，所提出的方法比最先进的方法取得了实质性的性能改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.