Human-Centric Fine-Grained Action Quality Assessment

IF 18.6 IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-04-01 DOI:10.1109/TPAMI.2025.3556935

Jinglin Xu;Sibo Yin;Yuxin Peng

{"title":"Human-Centric Fine-Grained Action Quality Assessment","authors":"Jinglin Xu;Sibo Yin;Yuxin Peng","doi":"10.1109/TPAMI.2025.3556935","DOIUrl":null,"url":null,"abstract":"Existing action quality assessment (AQA) methods mainly learn deep representations at the video level to score diverse actions. Due to the lack of a fine-grained understanding of actions in videos, they suffer from low credibility and accuracy, thus insufficient for stringent applications, such as competitive sports and sports injury rehabilitation. We argue that a fine-grained understanding of actions requires the model to parse actions in semantics, time, and space, which is the key to the credibility and accuracy of the AQA technique. Based on this insight, we propose a new human-centric fine-grained action quality assessment method named Unified Fine-grained spatial-temporal action Parser, namely <bold>Uni-FineParser</b>. It learns human-centric foreground action representations by focusing on target action regions within each frame and exploiting their fine-grained alignments in semantics, time, and space, minimizing the impact of invalid backgrounds during the assessment. In addition, we construct human-centric foreground action mask annotations for the FineDiving, AQA-7, and MTL-AQA datasets, respectively called <bold>FineDiving-HM</b>, <bold>AQA-7-HM</b>, and <bold>MTL-AQA-HM</b>. With refined spatio-temporal annotations on diverse target action procedures, Uni-FineParser can provide a potential for human-centric fine-grained action quality assessment with better interpretability. Through extensive experiments, we demonstrate the effectiveness of Uni-FineParser, which outperforms state-of-the-art methods while supporting more tasks of human-centric action understanding.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6242-6255"},"PeriodicalIF":18.6000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10946879/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Existing action quality assessment (AQA) methods mainly learn deep representations at the video level to score diverse actions. Due to the lack of a fine-grained understanding of actions in videos, they suffer from low credibility and accuracy, thus insufficient for stringent applications, such as competitive sports and sports injury rehabilitation. We argue that a fine-grained understanding of actions requires the model to parse actions in semantics, time, and space, which is the key to the credibility and accuracy of the AQA technique. Based on this insight, we propose a new human-centric fine-grained action quality assessment method named Unified Fine-grained spatial-temporal action Parser, namely Uni-FineParser. It learns human-centric foreground action representations by focusing on target action regions within each frame and exploiting their fine-grained alignments in semantics, time, and space, minimizing the impact of invalid backgrounds during the assessment. In addition, we construct human-centric foreground action mask annotations for the FineDiving, AQA-7, and MTL-AQA datasets, respectively called FineDiving-HM, AQA-7-HM, and MTL-AQA-HM. With refined spatio-temporal annotations on diverse target action procedures, Uni-FineParser can provide a potential for human-centric fine-grained action quality assessment with better interpretability. Through extensive experiments, we demonstrate the effectiveness of Uni-FineParser, which outperforms state-of-the-art methods while supporting more tasks of human-centric action understanding.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

以人为中心的细粒度行动质量评估

现有的动作质量评估（AQA）方法主要是在视频层面学习深度表征，对不同的动作进行评分。由于缺乏对视频中动作的细粒度理解，其可信度和准确性较低，不足以用于竞技体育和运动损伤康复等严格的应用。我们认为，对动作的细粒度理解要求模型在语义、时间和空间上解析动作，这是AQA技术可信度和准确性的关键。基于这一认识，我们提出了一种新的以人为中心的细粒度动作质量评估方法——统一细粒度时空动作解析器（uni - finesparser）。它通过关注每帧内的目标动作区域并利用它们在语义、时间和空间上的细粒度对齐来学习以人为中心的前景动作表示，从而在评估期间最大限度地减少无效背景的影响。此外，我们为FineDiving、AQA-7和MTL-AQA数据集构建了以人为中心的前景动作掩码注释，分别称为FineDiving- hm、AQA-7- hm和MTL-AQA- hm。通过对不同目标操作过程进行精细的时空注释，uni - finesparser可以提供以人为中心的细粒度操作质量评估，并具有更好的可解释性。通过大量的实验，我们证明了uni - finesparser的有效性，它优于最先进的方法，同时支持更多以人为中心的动作理解任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量