{"title":"SkatingVerse:全面评估人类动作理解的大规模基准","authors":"Ziliang Gan, Lei Jin, Yi Cheng, Yu Cheng, Yinglei Teng, Zun Li, Yawen Li, Wenhan Yang, Zheng Zhu, Junliang Xing, Jian Zhao","doi":"10.1049/cvi2.12287","DOIUrl":null,"url":null,"abstract":"<p>Human action understanding (HAU) is a broad topic that involves specific tasks, such as action localisation, recognition, and assessment. However, most popular HAU datasets are bound to one task based on particular actions. Combining different but relevant HAU tasks to establish a unified action understanding system is challenging due to the disparate actions across datasets. A large-scale and comprehensive benchmark, namely <b>SkatingVerse</b> is constructed for action recognition, segmentation, proposal, and assessment. SkatingVerse focus on fine-grained sport action, hence figure skating is chosen as the task object, which eliminates the biases of the object, scene, and space that exist in most previous datasets. In addition, skating actions have inherent complexity and similarity, which is an enormous challenge for current algorithms. A total of 1687 official figure skating competition videos was collected with a total of 184.4 h, exceeding four times over other datasets with a similar topic. SkatingVerse enables to formulate a unified task to output fine-grained human action classification and assessment results from a raw figure skating competition video. In addition, <i>SkatingVerse</i> can facilitate the study of HAU foundation model due to its large scale and abundant categories. Moreover, image modality is incorporated for human pose estimation task into <i>SkatingVerse</i>. Extensive experimental results show that (1) SkatingVerse significantly helps the training and evaluation of HAU methods, (2) the performance of existing HAU methods has much room to improve, and SkatingVerse helps to reduce such gaps, and (3) unifying relevant tasks in HAU through a uniform dataset can facilitate more practical applications. SkatingVerse will be publicly available to facilitate further studies on relevant problems.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 7","pages":"888-906"},"PeriodicalIF":1.5000,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12287","citationCount":"0","resultStr":"{\"title\":\"SkatingVerse: A large-scale benchmark for comprehensive evaluation on human action understanding\",\"authors\":\"Ziliang Gan, Lei Jin, Yi Cheng, Yu Cheng, Yinglei Teng, Zun Li, Yawen Li, Wenhan Yang, Zheng Zhu, Junliang Xing, Jian Zhao\",\"doi\":\"10.1049/cvi2.12287\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Human action understanding (HAU) is a broad topic that involves specific tasks, such as action localisation, recognition, and assessment. However, most popular HAU datasets are bound to one task based on particular actions. Combining different but relevant HAU tasks to establish a unified action understanding system is challenging due to the disparate actions across datasets. A large-scale and comprehensive benchmark, namely <b>SkatingVerse</b> is constructed for action recognition, segmentation, proposal, and assessment. SkatingVerse focus on fine-grained sport action, hence figure skating is chosen as the task object, which eliminates the biases of the object, scene, and space that exist in most previous datasets. In addition, skating actions have inherent complexity and similarity, which is an enormous challenge for current algorithms. A total of 1687 official figure skating competition videos was collected with a total of 184.4 h, exceeding four times over other datasets with a similar topic. SkatingVerse enables to formulate a unified task to output fine-grained human action classification and assessment results from a raw figure skating competition video. In addition, <i>SkatingVerse</i> can facilitate the study of HAU foundation model due to its large scale and abundant categories. Moreover, image modality is incorporated for human pose estimation task into <i>SkatingVerse</i>. Extensive experimental results show that (1) SkatingVerse significantly helps the training and evaluation of HAU methods, (2) the performance of existing HAU methods has much room to improve, and SkatingVerse helps to reduce such gaps, and (3) unifying relevant tasks in HAU through a uniform dataset can facilitate more practical applications. SkatingVerse will be publicly available to facilitate further studies on relevant problems.</p>\",\"PeriodicalId\":56304,\"journal\":{\"name\":\"IET Computer Vision\",\"volume\":\"18 7\",\"pages\":\"888-906\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12287\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Computer Vision\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.12287\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.12287","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
人类动作理解(HAU)是一个广泛的主题,涉及动作定位、识别和评估等具体任务。然而,大多数流行的 HAU 数据集都是基于特定动作的任务。由于数据集中的动作各不相同,因此结合不同但相关的 HAU 任务来建立统一的动作理解系统具有挑战性。我们构建了一个大规模的综合基准,即 SkatingVerse,用于动作识别、分割、建议和评估。SkatingVerse 专注于细粒度的运动动作,因此选择了花样滑冰作为任务对象,从而消除了之前大多数数据集中存在的对象、场景和空间的偏差。此外,花滑动作本身具有复杂性和相似性,这对当前的算法是一个巨大的挑战。我们共收集了 1687 个官方花样滑冰比赛视频,总时长达到 184.4 小时,是其他类似主题数据集的四倍之多。SkatingVerse 能够制定统一的任务,从原始花样滑冰比赛视频中输出精细的人体动作分类和评估结果。此外,SkatingVerse 的规模大、类别多,有助于 HAU 基础模型的研究。此外,SkatingVerse 还采用了图像模式来完成人体姿势估计任务。广泛的实验结果表明:(1)SkatingVerse 对 HAU 方法的训练和评估有很大帮助;(2)现有 HAU 方法的性能还有很大提升空间,SkatingVerse 有助于缩小这些差距;(3)通过统一的数据集统一 HAU 中的相关任务可以促进更多的实际应用。SkatingVerse 将向公众开放,以促进对相关问题的进一步研究。
SkatingVerse: A large-scale benchmark for comprehensive evaluation on human action understanding
Human action understanding (HAU) is a broad topic that involves specific tasks, such as action localisation, recognition, and assessment. However, most popular HAU datasets are bound to one task based on particular actions. Combining different but relevant HAU tasks to establish a unified action understanding system is challenging due to the disparate actions across datasets. A large-scale and comprehensive benchmark, namely SkatingVerse is constructed for action recognition, segmentation, proposal, and assessment. SkatingVerse focus on fine-grained sport action, hence figure skating is chosen as the task object, which eliminates the biases of the object, scene, and space that exist in most previous datasets. In addition, skating actions have inherent complexity and similarity, which is an enormous challenge for current algorithms. A total of 1687 official figure skating competition videos was collected with a total of 184.4 h, exceeding four times over other datasets with a similar topic. SkatingVerse enables to formulate a unified task to output fine-grained human action classification and assessment results from a raw figure skating competition video. In addition, SkatingVerse can facilitate the study of HAU foundation model due to its large scale and abundant categories. Moreover, image modality is incorporated for human pose estimation task into SkatingVerse. Extensive experimental results show that (1) SkatingVerse significantly helps the training and evaluation of HAU methods, (2) the performance of existing HAU methods has much room to improve, and SkatingVerse helps to reduce such gaps, and (3) unifying relevant tasks in HAU through a uniform dataset can facilitate more practical applications. SkatingVerse will be publicly available to facilitate further studies on relevant problems.
期刊介绍:
IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision.
IET Computer Vision welcomes submissions on the following topics:
Biologically and perceptually motivated approaches to low level vision (feature detection, etc.);
Perceptual grouping and organisation
Representation, analysis and matching of 2D and 3D shape
Shape-from-X
Object recognition
Image understanding
Learning with visual inputs
Motion analysis and object tracking
Multiview scene analysis
Cognitive approaches in low, mid and high level vision
Control in visual systems
Colour, reflectance and light
Statistical and probabilistic models
Face and gesture
Surveillance
Biometrics and security
Robotics
Vehicle guidance
Automatic model aquisition
Medical image analysis and understanding
Aerial scene analysis and remote sensing
Deep learning models in computer vision
Both methodological and applications orientated papers are welcome.
Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review.
Special Issues Current Call for Papers:
Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf
Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf