Localization and recognition of human action in 3D using transformers

Communications engineering Pub Date : 2024-09-03 DOI:10.1038/s44172-024-00272-7

Jiankai Sun, Linjiang Huang, Hongsong Wang, Chuanyang Zheng, Jianing Qiu, Md Tauhidul Islam, Enze Xie, Bolei Zhou, Lei Xing, Arjun Chandrasekaran, Michael J. Black

{"title":"Localization and recognition of human action in 3D using transformers","authors":"Jiankai Sun, Linjiang Huang, Hongsong Wang, Chuanyang Zheng, Jianing Qiu, Md Tauhidul Islam, Enze Xie, Bolei Zhou, Lei Xing, Arjun Chandrasekaran, Michael J. Black","doi":"10.1038/s44172-024-00272-7","DOIUrl":null,"url":null,"abstract":"Understanding a person’s behavior from their 3D motion sequence is a fundamental problem in computer vision with many applications. An important component of this problem is 3D action localization, which involves recognizing what actions a person is performing, and when the actions occur in the sequence. To promote the progress of the 3D action localization community, we introduce a new, challenging, and more complex benchmark dataset, BABEL-TAL (BT), for 3D action localization. Important baselines and evaluating metrics, as well as human evaluations, are carefully established on this benchmark. We also propose a strong baseline model, i.e., Localizing Actions with Transformers (LocATe), that jointly localizes and recognizes actions in a 3D sequence. The proposed LocATe shows superior performance on BABEL-TAL as well as on the large-scale PKU-MMD dataset, achieving state-of-the-art performance by using only 10% of the labeled training data. Our research could advance the development of more accurate and efficient systems for human behavior analysis, with potential applications in areas such as human-computer interaction and healthcare. Jiankai Sun, Michael J. Black and colleagues present a benchmark for human movement analysis. Their transformer-based approach, LocATe, learns to perform both temporal action localization and recognition.","PeriodicalId":72644,"journal":{"name":"Communications engineering","volume":" ","pages":"1-15"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11372174/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s44172-024-00272-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Understanding a person’s behavior from their 3D motion sequence is a fundamental problem in computer vision with many applications. An important component of this problem is 3D action localization, which involves recognizing what actions a person is performing, and when the actions occur in the sequence. To promote the progress of the 3D action localization community, we introduce a new, challenging, and more complex benchmark dataset, BABEL-TAL (BT), for 3D action localization. Important baselines and evaluating metrics, as well as human evaluations, are carefully established on this benchmark. We also propose a strong baseline model, i.e., Localizing Actions with Transformers (LocATe), that jointly localizes and recognizes actions in a 3D sequence. The proposed LocATe shows superior performance on BABEL-TAL as well as on the large-scale PKU-MMD dataset, achieving state-of-the-art performance by using only 10% of the labeled training data. Our research could advance the development of more accurate and efficient systems for human behavior analysis, with potential applications in areas such as human-computer interaction and healthcare. Jiankai Sun, Michael J. Black and colleagues present a benchmark for human movement analysis. Their transformer-based approach, LocATe, learns to perform both temporal action localization and recognition.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用变压器定位和识别三维人体动作。

从一个人的三维运动序列中了解其行为是计算机视觉中的一个基本问题，应用广泛。这个问题的一个重要组成部分是三维动作定位，它涉及识别一个人正在做什么动作，以及这些动作在序列中出现的时间。为了促进三维动作定位领域的进步，我们为三维动作定位引入了一个全新的、具有挑战性的、更复杂的基准数据集 BABEL-TAL (BT)。我们在此基准上精心建立了重要的基准和评估指标，并进行了人工评估。我们还提出了一个强大的基线模型，即用变换器定位动作（LocATe），它能在三维序列中联合定位和识别动作。所提出的 LocATe 在 BABEL-TAL 和大规模 PKU-MMD 数据集上都表现出了卓越的性能，仅使用了 10% 的标注训练数据就达到了最先进的性能。我们的研究可以推动更准确、更高效的人类行为分析系统的开发，并有望应用于人机交互和医疗保健等领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Communications engineering

自引率

0.00%

发文量